Evaluation & Plotting
Key Technologies
The framework is built on top of pandas, seaborn and dask
Dask is used as a way to build task dependency graphs using
dask.delayed
and for
scalable parallelisation of the execution of those tasks, either locally on
a desktop/laptop or remotely on a (SLURM) cluster using
Dask.distributed and
Dask-Jobqueue.
There are a few key concepts necessary for using the framework effectively:
The main data structure of the framework is the
pandas.DataFrame
, which is
similar to a spreadsheet or a table in a SQL database. For everything
interesting you will need some knowledge how to index, partition and apply
a function to a DataFrame
.
A short introduction
to the data structures used in pandas.
Reading the documentation for
pandas.DataFrame.groupby
is advisable.
Getting Insight
The evaluation and plotting is done by running run_recipe.py
with the name of
the ‘recipe’ YAML file, which contains the steps for processing the results of a batch of simulation
runs. For a minimal introduction to YAML and some specifics of PyYAML see the PyYAML documentation
Recipes
The recipe describes the individual tasks as a list of key-value pairs; internally the data and operations are constructed (by dask) into an dependency/task graph.
A recipe can contain two phases: evaluation
and plot
. Each phase is optional.
The evaluation
phase itself consists of three sub-phases:
extractors
: for extracting the desired data from the input databasestransforms
: for processing the extracted data in some wayexporter
: for saving the extracted and possibly processed data
The plot
phase consists of three sub-phases:
reader
: for loading the data exported from theevaluation
sub-phasetransforms
: for processing the loaded data in some waytasks
: for actually plotting the loaded and possibly processed data
The sub-phases are evaluated in this order. Each sub-phase consists of a list of tasks to execute and each task usually has a dependency on a task in the previous sub-phase. If a task is not depended upon by another task, it will not be part of the dependency/task graph and will thus not be executed.
Each task either creates or modifies a named ‘dataset’, a list of
pandas.DataFrame
s. Usually an extractor creates a pandas.DataFrame
for
each input file and stores the resulting list under the user defined
dataset_name
in an internal dictionary, all transforms
are then executed
over each DataFrame
in that list separately and then written to disk, either
separately or concatenated into a single DataFrame
and then written to disk.
The basic structure of a recipe is thus, for the evaluation phase:
evaluation:
extractors:
dataset0: !extractor_class
parameter0: "value"
transforms:
transformed_dataset0: !transform_class
dataset_name: "dataset0"
parameter0: "value"
exporter:
name0: !exporter_class
dataset_name: "transformed_dataset0"
parameter0: "value"
For the plotting phase:
plot:
reader:
dataset0: !reader_class
input_files:
- "/path/regular/expression0"
- "/path/regular/expression1"
transforms:
transformed_dataset0: !transform_class
dataset_name: "dataset0"
output_dataset_name: "dataset0"
parameter0: "value"
tasks:
plot_task0: !plotting_class
dataset_name: "transformed_dataset0"
For simplicity only one task is listed for each sub-phase, but an arbitrary number of tasks is possible.
For ease of use, the following omissions are possible:
the
transforms
phase is optional, chaining transforms is possiblethe
exporter
andreader
phases are optional, theplot
phase can just use the datasets extracted in theevaluation
phase
The first line of the definition of a task has the format ‘<task_name>: !<task_classname>’ and defines the name and the type of the transform. The type is just the class name (or more precisely, the YAML tag assigned to the class, but they are literally the same) of the desired operation. What follows are the parameters of the constructor for the class.
In the extractors
and reader
sub-phase, the collection of data that is to be
extracted, processed and plotted is given a name so as to allow referencing it
in the task of other sub-phases. In the example above, the name given is
dataset0
.
Each task has at least one parameter dataset_name
, which references the input
dataset the task operates upon. A transform will always have parameter
output_dataset_name
to assign a name to the result of the operation. This
assignment can overwrite previously defined names, and thus also free the
associated data if they are not being depended upon by another instance.
Most of the documentation of the parameters of the actual components is in the
API documentation, e.g. the documentation one needs for just plotting is in
plots.PlottingTask
, specifically in the documentation of the parameters of the
constructor for that class.
Examples
Example recipes can be found in the examples
directory in the root of this
repository:
lineplot.yaml
: a basic recipe for producing a CBR-over-MPR lineplot. One should probably start with this as template.CUI.yaml
: this calculates, for every vehicle, the mean of the differences between consecutive receptions of a CAM and plots them as a lineplot. This is probably the second template to look at, as it usesGroupedFunctionTransform
to partition the input data by MPR and the name of the module emitting the signal used as marker for CAM emission.recipe.yaml
: a more elaborate recipe showcasing all possible optionsstatistic.yaml
: this extracts the results for the statNameLemObjectUpdateInterval:stats
from thestatistic
table and saves them, then plots the mean of the values (from thestatMean
column of the table) of those over the market rate.boxstats.yaml
: showcases using a custom function to calculate the values needed for a boxplot, forward them as a python list to the exporter and save them as a JSON filesqlextractor.yaml
: a showcase for the generic SQLite extractor