Saturday, July 20, 2013

Opening the Black Box



Among the most powerful tools for empirical modeling are machine learning procedures such as neural networks. In recent years, Multilayer Perceptron (MLP) neural networks have begun to show up in land change modeling software such as the Land Change Modeler (LCM – an extension in the IDRISI software system). While very powerful because of their ability to model complex non-linear relationships, some are critical of neural networks because they are considered to operate as black boxes. This can be very important in the development of standards for land cover change modeling to support REDD (Reducing Emissions from Deforestation and Forest Degradation). REDD projects involve payments for ecosystem services (PES) in return for activities that reduce expected forest loss. A thorough assessment of the skill of the model that projects business as usual forest loss is therefore critical. To address this need, Clark Labs has therefore opened up the black box of MLP to provide the accountability necessary to support REDD and similar mission critical modeling efforts. This was provided as a free update in IDRISI Selva (version 17.01).



A matter of skill


The key to our approach is an assessment of model skill based on validation data. LCM models the relationship between land cover transitions and explanatory variables (such as proximity to roads, slopes and soils) by using examples of areas known to have transitioned in the past as training areas. In LCM’s MLP neural network procedure, half of the training data are held back for validation. Thus it learns on the half designated for training and tests the skill of the model based on the half reserved for validation. With these validation cases we know what transition the land actually went through as well as the values of the explanatory variables at those locations. However, the model was never trained on the validation data. Thus it is a true test.

Using the validation data, the MLP procedure in LCM now reports:

  • The overall skill of the model (a value that ranges from -1 to +1)
  • The skill in predicting each transition (a change from one land cover to another)
  • The skill in predicting persistence (cases where the transitions were eligible to happen, but did not)
  • The contribution of each explanatory variable to the overall model skill.
  • A backwards stepwise model assessment procedure which permits a quick assessment of the most parsimonious model (the model with the greatest power while using the least resources -- the one with the most bang for the buck).



In a modeling of land cover change in lowland Bolivia, four explanatory variables are assessed using the Multilayer Perceptron to evaluate their ability to predict transitions involving Humid Forest and Agriculture. The overall model skill was 0.66. This skill in predicting areas that would transition from humid forest to agriculture was 0.64 while the skill at predicting which areas (although eligible to transition) would not change was slightly higher at 0.68. The map shows the modeled transition potential from humid forest to agriculture and the window to the right shows a portion of the skill assessment statistics. From this it can be seen that slopes was the least important explanatory variable while the cost distance to Santa Cruz was the most important. Proximity to local markets was the second most powerful variable followed by proximity to roads. However, the backwards stepwise assessment (the last graph) suggests that all four variables should probably be kept.

The skill measure used expresses the accuracy of the model, based on the validation data, compared to the expected accuracy that would occur by chance. A skill of 0 indicates that the model has no better than chance agreement with reality while a skill of 1 would indicate a perfect prediction. It should be noted that this is a measure of the skill of the model to predict what happened in the past (i.e., the period over which it trained). Thus, truly, it is a hindcast skill measure rather than a forecast skill. However, to the extent that one can assume that business as usual conditions will persist, it is a reasonable statement of the expected skill of the model in the future.



Try it out!


New to Clark Labs software? Haven’t upgraded to IDRISI Selva yet? Try out a free evaluation copy today.