Daniel R. Hunter
I. M. Systems Group
November 2019

The purpose of the script configuration_test.py is to test the execution of the Framework (or the Algorithm Service) with many "configuration instances".
A configuration instance is created by the Framework when run with a given configuration XML file (such as from a Project directory) in an "environment instance".
The bash scripts in Projects' Test_Scripts directory are examples of singular environment instances.

Pseudo-code of the whole process is as follows:
1. Get an XML file "config.xml" and a JSON file "env_space.json"
2. For each environment instance (see below for details)...
  a. Create a dictionary of environment variable names to environment instance values.
  b. Start a new shell (via subprocess) with this environment.
  c. Execute the Framework preprocessor in this shell as usual: `./framework.exe config.xml -m pre`
  d. Check for a successful or unsuccessful exit code and exit the shell.
3. Give a tally of environment instances that were successful or unsuccessful.

Environment instances are generated from a JSON file containing the values or sets of values for the environment variables needed in the given (top-level) configuration XML file and other XML files as included.
Note that the environment variables defined in any shell that executes the Framework only contain those specified in the given JSON file! This is to ensure tight control over the automated testing. The downside of this is that the JSON files contain many unused or irrelevant environment variable names (but see Future Improvements 2 & 3 below).
A null value in a JSON file will be used to unset the associated environment variable.

An environment JSON object is a dictionary of key-value pairs. Keys are strings representing environment variable names (e.g. "ENV_SATELLITE"). Values should be null (the JSON value null, not the string "null"), a string, or an array containing strings and optionally null.

__Example__
Say our top-level configuration XML contains the following:
```
<project>JRR</project>
<SATELLITE>${ENV_SATELLITE}</SATELLITE>
<output_directory>${ENV_OUTPUT_DIRECTORY}</output_directory>
<NUM_COL_SEG>${-sel ENV_NUM_COL_SEG, '1'}</NUM_COL_SEG>
<NUM_ROW_SEG>${-sel ENV_NUM_ROW_SEG, '1'}</NUM_ROW_SEG>
```
If we wanted to test this configuration for multiple satellites and segmentations, we might write the following environment JSON:
```
{
  "ENV_OUTPUT_DIRECTORY": "./Output/",
  "ENV_SATELLITE": ["NPP", "NOAA20"],
  "ENV_NUM_COL_SEG": [null, "1", "5", "10"],
  "ENV_NUM_ROW_SEG": [null, "3", "4"]
}
```
We have three environment variables that vary, and the product of the number of possible values for each (including null) is 2*4*3 = 24. Thus, the Framework will be run 24 times. Note that when ENV_NUM_COL_SEG and/or ENV_NUM_ROW_SEG are null, the default value of "1" in the configuration XML will be used. Note that the configuration tester has no knowledge of the contents of a configuration XML file, so naively testing ENV_NUM_COL_SEG="1" will cause 24/4 = 6 unnecessary runs.
In a tabular format, the exhaustive list of configuration instances run are described as follows, where columns are XML elements. The elements `<project>` and `<output_directory>` have the values "JRR" and "./Output/" in every instance.
SATELLITE    NUM_COL_SEG    NUM_ROW_SEG
      NPP       1 (null)       1 (null)
      NPP       1 (null)              3
      NPP       1 (null)              4
      NPP              1       1 (null)
      NPP              1              3
      NPP              1              4
      NPP              5       1 (null)
      NPP              5              3
      NPP              5              4
      NPP             10       1 (null)
      NPP             10              3
      NPP             10              4
   NOAA20       1 (null)       1 (null)
   NOAA20       1 (null)              3
   NOAA20       1 (null)              4
   NOAA20              1       1 (null)
   NOAA20              1              3
   NOAA20              1              4
   NOAA20              5       1 (null)
   NOAA20              5              3
   NOAA20              5              4
   NOAA20             10       1 (null)
   NOAA20             10              3
   NOAA20             10              4


Mathematically (perhaps needlessly) we can think of the non-null environment variables as "dimensions" that span an environment "space". The lists of values for these dimensions define the "environment volume" that we will test. Each environment instance is then a discrete point inside the environment volume that is mapped (using the given configuration XML file) to a configuration instance in a configuration volume in a configuration space. Because of the complexity of our configuration technology, it is not guaranteed that this mapping is isomorphic. In other words, two different environment instances may result in equivalent configurations. The size or number of instances inside an environment volume is the product of the sizes of the sets of values for environment variables.


__Future__
1. It would be very easy to make the script accept multiple JSON files and iterate over the environment instances generated from each.
2. If this is done, it would be extremely prudent to also modify the script to accept an optional "common environment" JSON, the contents of which are to be added to every generated environment instance. It is likely worthwhile to do this even if the previous suggestion is not implemented.
3. It would be very easy to make the script optionally use the parent shell's environment (i.e. yours) as the common environment (to be modified by given common environment JSON files). This should not be the default and should certainly not be used in automated runs of the script! Instead, be complete in preparing your JSON files or create a common environment JSON file.
4. For now, we run only the Framework preprocessor. This is because running the segment processing from scratch (i.e. with INPUT_LIST empty) can take hours in some cases. We have seen in the example that the environment volume grows very quickly, so testing the entire segment processing becomes untenable. Still, this is a useful approach for testing configuration and preprocessing. It would be very useful to include validation step along with the preprocessing and this seems very doable as the Framework validation matures.
5. Option for diagnostic mode.
6. Give path to executable as an argument.
7. Save output/error logs for each job.