The introductory computer exercise using Python and the SUND toolbox¶
This version of the exercise is implemented as a Python version using the SUND toolbox. This lab will on a conceptual level guide you through all the main sub-tasks in a systems biology project: model formulation, parameter estimation with respect to experimental data, model evaluation, statistical testing, (uniqueness) analysis, identification of interesting predictions, and testing of those predictions.
You can download a prepared template python file, which includes all headers throughout the lab.
Introduction (Read me)
Introduction¶
Preparations¶
Get familiar with python
Get familiar with Python if you aren't already. We have collected some resources on how to get started.
Get Python running
To do the lab, you need access to Python version >=3.10. Depending on if you are using a private computer or one in the computer hall, you need to do either of the alternatives below. If you are not able to run Python on your computer, you can also run the exercise in a notebook online in Google colab.
At your own computer
To do this exercise, you need to have both Python and C-compiler installed.
We have collected instructions on how to install Python and setting up a compiler here. The instructions vary a bit for different operating systems. A condensed version, with our preferred approach is given below.
Recommended installation
Install uv (which handles Python and packages):
winget install --id=astral-sh.uv -e
Install C-compiler (needed for SUND toolbox):
winget install Microsoft.VisualStudio.2022.Community --override "--wait --quiet --add ProductLang En-us --add Microsoft.VisualStudio.Workload.NativeDesktop --includeRecommended"
Install uv (which handles Python and packages):
curl -LsSf https://astral.sh/uv/install.sh | sh
Install C-compiler (needed for SUND toolbox):
xcode-select --install
Install uv (which handles Python and packages):
curl -LsSf https://astral.sh/uv/install.sh | sh
C-compiler: Usually already installed (gcc). If not install with the package manager for your distro.
Setup project:
- Create project folder:
mkdir ~/projects/sund_advanced_lab - Navigate:
cd ~/projects/sund_advanced_lab - Initialize:
uv init - Test installation:
uv run --with sund python -c "import sund; sund.test_distribution()"
If the test_distribution() finds that everything is working as intended - then everything is installed correctly.
On Google colab
On Google colab, python is already available. However, we need to install packages in a slightly different way. Instead of installing packages via pip from the terminal, you instead use a code cell with the following code:
# Installing packages. Warning, this code cell will install packages to your computer.
%pip install jupyter matplotlib numpy scipy sund
In the computer halls
If you are a LiU student, you can also use the resources in the computer halls/thinlinc. However, we do recommend that you get this working on your own computer if you will be doing a systems biology project.
Install packages
Remember that you need to have installed a compiler before trying to install the SUND toolbox
After that you also need to install numpy, matplotlib, scipy and sund:
uv init # Skip if the uv project files is already setup
uv add numpy matplotlib scipy sund
uv add jupyter # Optional, only needed if you want to run interactive sessions in vs code
pip install numpy matplotlib scipy sund
pip install jupyter # Optional, only needed if you want to run interactive sessions in vs code
Read ahead on the background (optional)
Most of the tasks below start with About ... section. If you want to prepare, these sections can be read in advance of the lab. They are written more as a background about the different topics, and can be read without solving the previous tasks.
About the SUND package documentation¶
In practice, this exercise should contain all information needed to do the exercise without going to the documentation. You can find documentation of the SUND toolbox here if interested.
Commenting your code
As you add more and more code into your .py script, it will take longer and longer time to run the code. Remember that you can temporarily disable sections by adding comments (using "#") to skip executing certain lines, typically the optimization calls and the plot calls. Note that some sections and function definitions are needed to be rerun every time. ---
Have you read the introduction above?
Getting to know the experimental data¶
In this task, you will start to get familiar with the problem at hand, and will create functions that can plot the experimental data.
About the background on the problem at hand
The biological system and experimental data¶
During the following exercises you will take on the role of a system biologist working in collaboration with an experimental group.
The experimental group is focusing on the cellular response to growth factors, like the epidermal growth factor (EGF), and their role in cancer. The group has a particular interest in the phosphorylation of the receptor tyrosine kinase EGFR in response to EGF and therefore perform an experiment to elucidate the role of EGFR. In the experiment, they stimulate a cancer cell line with EGF and use an antibody to measure the tyrosine phosphorylation at site 992, which is known to be involved in downstream growth signalling. The data from these experiments show an interesting dynamic behaviour with a fast increase in phosphorylation followed by a decrease down to a steady state level which is higher than before stimulation. This dataset contains two stimulations, one with a low dose (Figure 1) and one with a high dose (Figure 2).
To generalize this problem, we refer to EGF as the Activator (A), EGFR as Receptor (R), and the downstream phosphorylation target as Substrate (S). We also use the suffix 'p' to denote the phosphorylated/active form of the molecule, so Rp is the phosphorylated form of the receptor and Sp is the phosphorylated form of the substrate.

Figure 1: The experimental data. A cancer cell line is stimulated with EGF at time point 5, and the response at Rp and R is measured

Figure 2: The experimental data. A cancer cell line is stimulated with EGF at time point 5, and the response at Rp and R is measured
The experimental group does not know the underlying mechanisms that give rise to this dynamic behaviour. However, they have two different ideas of possible mechanisms, which we will formulate as hypotheses below. The first idea is that there is a negative feedback transmitted from EGFR to a downstream signalling molecule that is involved in dephosphorylation of EGFR. The second idea is that a downstream signalling molecule is secreted to the surrounding medium and there catalyses breakdown of EGF, which also would reduce the phosphorylation of EGFR. The experimental group would like to know if any of these ideas agree with their data, and if so, how to design experiments to find out which is the more likely idea.
Create a Python script and import the necessary packages
Start by creating a file that ends in .py, for example main.py (or main.ipynb if you prefer to work in notebooks), and open it in your favourite text editor (if you do not have a favourite editor yet we recommend using VS codium/code).
We need to import some packages to get access to functions not available in the standard version of Python. You will need numpy, matplotlib and json to start with.
To avoid long names you can also use the as statement to give the imported packages a new name: import matplotlib.pyplot as plt
Typically, the imports are done at the top of the script, grouped based on if they are built-in packages (e.g. json) or if they are packages that need to be installed (e.g. sund), sorted alphabetically within these groups.
# Example, you do not need to copy this.
import json # Direct import of a built-in package
from scipy.stats import chi2 # Importing a specific function from a package
import matplotlib.pyplot as plt # Importing a package with a short name (alias) for easier use
Now, import the packages that you will need. We will also setup an encoder to be able to save numpy arrays later.
# Python built-in modules import
import sys
import json
import csv
import random
# Python package imports (these needs to be installed)
import numpy as np
import matplotlib.pyplot as plt
import sund
from scipy.stats import chi2
from scipy.optimize import Bounds
from scipy.optimize import differential_evolution
# Setting up a custom encoder to be able to save numpy arrays later
from json import JSONEncoder
class NumpyArrayEncoder(JSONEncoder):
def default(self, o):
if isinstance(o, np.ndarray):
return o.tolist()
return JSONEncoder.default(self, o)
sund.test_distribution() # This is just to test that everything (most likely) got installed correctly. You can remove this later
If you haven't already, create a python file, (we will assume it is named main.py) and add the imports listed above to the script.
Download and load the experimental data
Start by setting up the data in either of the following ways:
Option 1: Download and then load the json file (recommended)
Right-click on this link, and select "save link as". Put the datafile in your working directory (i.e. where you have the files). If you want, you can open the file and inspect the contents.
Next, you will need to load the experimental data into Python. You will need to open the file with the open function and then using the json.load function to load the data. To avoid having to manually close the file and adapt to errors, the best practice is to use the with statement.
with open("data_sund_2026-03-18.json", "r") as f:
data = json.load(f)
Now, you have a dictionary called data in Python which contains the experimental data.
Option 2: Open the data file and directly define the data structure in Python
First, open the data file by going to this linked file and copy its contents, then paste it into your script. You should assign the content to a variable (we assume it is called data).
Note that you will have to manually replace Infinity with float('inf') in the data.
Your code will look something like this:
data = {
"dose1": {
...
},
"dose2": {
...
},
"step_dose": {
...
}
}
Now, observe the data using print().
print(data)
The data structure is a dictionary, which consists of keys and linked values. The data is (simplified) structured in three levels, with the top level being the different datasets (or experiments), the second level being the different observables (or data series) and the third level being the actual measurements (time, mean, sem):
data
├── dataset (experiment)
│ ├── data series (observable)
│ │ ├── time
│ │ ├── mean
│ │ └── sem
Note that in this data structure, the experimental measurements (time, mean, sem) are nested under an observables key, with an entry for each observable ('Rp' and 'R'). So to access the time points for Rp in dose1, you would use
data["dose1"]['observables']['Rp']['time']
We also have the stimulation details listed for each dataset in the input, which defines the characteristics of the input. In this example A is added at time point 5, and we then set A=1 at time point 5.
{
"dose1": {
"input": { "A": { "t": [5], "f": [1], "unit": "nM" } },
...
},
...
}
Also note that the observables have different time vectors. This will be problematic for us later, so we need to collect all needed time points for each experiment.
Collect the full list of time points
We will create a new entry with the combined time points at the top level of each experiment. To do this, we iterate over the experiments in data, find the time points for each observable and put those into a list. Finally, we find the unique time points (to avoid duplicates), sort the list of time points and add that to the data dictionary.
for experiment in data:
all_time_points = []
for obs in data[experiment]['observables']:
all_time_points += data[experiment]['observables'][obs]['time']
data[experiment]['all_time_points'] = sorted(list(set(all_time_points))) # Create a sorted list of all unique time points
The data downloaded consists of three datasets: dose1, dose2, and step_dose. We will use dose1 and dose2 for parameter estimation, and step_dose for model validation. We need to split step_dose into a separate dictionary before proceeding.
Split the data into estimation and validation data
To move the step_dose data to its own dictionary, we can simply use the dictionary function pop() which removes the field from data and stores it in our new dictionary data_validation.
data_validation = {}
data_validation['step_dose'] = data.pop('step_dose')
You can verify that the data is split correctly by printing the dataset keys. Now, step_dose should only be in the validation data:
print(f"Estimation data experiments: {data.keys()}")
print(f"Validation data experiments: {data_validation.keys()}")
Notice that we used f-strings (f"") to include variables directly inside strings. Anything wrapped in {} is evaluated, in this case {data.keys()}.
Visualize the data
A good start to understand the data is to visualize it. Now, we will implement and then use a function to plot the data. But first, since we have multiple datasets in the data, and multiple data series in each dataset (for each observable), we will start by implementing a function to plot a single data series (plot_data_series()).
The function can be placed either at the top of the script (after the imports) or before it is used, the location does not matter in Python so you can pick wherever you prefer.
The function should take obs_data as input and plot all the points, with error bars. Since we are plotting a single dataset, it also needs an axis object (ax) as input to plot the series on. We will provide this later. The final figure should have nice labels and a title.
Useful Python/matplotlib commands (remember that we renamed matplotlib.pyplot as plt) include: ax.errorbar, ax.set_xlabel, ax.set_ylabel, ax.set_title, ax.plot.
# Define a function to plot one data series (one observable in one experiment)
def plot_data_series(ax, obs_data, face_color='k', label='Data'):
ax.errorbar(
obs_data['time'],
obs_data['mean'],
obs_data['sem'],
label=label,
linestyle='None',
marker='o',
markerfacecolor=face_color,
color='k',
capsize=5,
)
# TODO: add nice x-labels and y-labels to the ax object
Before we can create the axis object we need for the plot, we need to know how many subplots we need and in what configuration. Typically, we strive towards having a square-like grid (i.e. 2x2, 3x3 etc). We can create a function that finds the smallest square grid for us:
def find_subplot_layout(data):
num_observables = len(data['observables'])
num_rows = int(np.round(np.sqrt(num_observables)))
num_cols = int(np.ceil(num_observables / num_rows))
return num_rows, num_cols
Now, we can proceed to create a function which can plot our data with multiple datasets (plot_data()). We will leverage our plot_data_series() function and call that from the new plot_data() function. We will iterate over the datasets and observables series in the data, and make sure that the correct ax and observable data is passed to plot_data_series(). In this way, we can reuse plot_data_series() to plot any data:
def plot_data(data, face_color='k'):
for experiment in data:
fig = plt.figure() # create a new figure
n_rows, n_cols = find_subplot_layout(data[experiment])
# iterate over all observables
plt_idx = 0 # Instantiate a counter to keep track of the subplot index
for obs in data[experiment]['observables']:
plt_idx = plt_idx + 1 # Update the subplot index
ax = fig.add_subplot(n_rows, n_cols, plt_idx) # create a new subplot in grid n_rows x n_cols at position plt_idx
observable_data = data[experiment]['observables'][obs]
plot_data_series(ax, observable_data, face_color=face_color)
ax.set_title(f'{obs}')
ax.legend() # Show the legend for the subplot
fig.suptitle(f'{experiment}') # add a shared "Super"-title for the full figure
fig.tight_layout() # autoformat the grid layout to make it look nice
Now that we have all the functions we need to plot the data, we will use the plot_data() function to plot the data. To make the plot visible, we typically have to run plt.show(). This will display all plots generated, and pause the python script until the plots are closed. Therefore, it might be a good idea to keep the plt.show() at the end of your script.
Plot the data and make sure your plots look nice.
plot_data(data)
plt.show()
Saving figures
If you want to, you can also save figures as a .png with resolution 300 dpi using:
plt.savefig("figure_name.png", dpi=300)
Or as a .svg by changing the file extension
plt.savefig("figure_name.svg")
You should now have a plot of the data. You can self-check if you have implemented things using the box below.
(spoilers) Self-check: Have you plotted the data correctly?
If you have implemented everything correctly, your plots should look something like the figures below.


Furthermore, your code should look something like this.
Code sync
# Python built-in modules import
import sys
import json
import csv
import random
# Python package imports (these needs to be installed)
import numpy as np
import matplotlib.pyplot as plt
import sund
from scipy.stats import chi2
from scipy.optimize import Bounds
from scipy.optimize import differential_evolution
# Setting up a custom encoder to be able to save numpy arrays later
from json import JSONEncoder
class NumpyArrayEncoder(JSONEncoder):
def default(self, o):
if isinstance(o, np.ndarray):
return o.tolist()
return JSONEncoder.default(self, o)
# Load and print the data
with open("data_sund_2026-03-18.json", "r") as f:
data = json.load(f)
print(data)
# Add all unique time points per experiment (sorted)
for experiment in data:
all_time_points = []
for obs in data[experiment]['observables']:
all_time_points += data[experiment]['observables'][obs]['time']
data[experiment]['all_time_points'] = sorted(list(set(all_time_points)))
# split the data into training and validation data
data_validation = {}
data_validation['step_dose'] = data.pop('step_dose')
print(f"Estimation data experiments: {data.keys()}")
print(f"Validation data experiments: {data_validation.keys()}")
# Define a function to plot one data series (one observable in one experiment)
def plot_data_series(ax, obs_data, face_color='k', label='Data'):
ax.errorbar(
obs_data['time'],
obs_data['mean'],
obs_data['sem'],
label=label,
linestyle='None',
marker='o',
markerfacecolor=face_color,
color='k',
capsize=5,
)
ax.set_xlabel('Time (minutes)')
ax.set_ylabel('Phosphorylation (a.u.)')
# define a function to find the subplot layout based on the number of observables
def find_subplot_layout(data):
num_observables = len(data['observables'])
num_rows = int(np.round(np.sqrt(num_observables)))
num_cols = int(np.ceil(num_observables / num_rows))
return num_rows, num_cols
# Define a function to plot all datasets
def plot_data(data, face_color='k'):
for experiment in data:
fig = plt.figure() # create the figure
n_rows, n_cols = find_subplot_layout(data[experiment])
# iterate over all observables
plt_idx = 0 # Instantiate a counter to keep track of the subplot index
for obs in data[experiment]['observables']:
plt_idx = plt_idx + 1 # Update the subplot index
ax = fig.add_subplot(n_rows, n_cols, plt_idx) # create a new subplot in grid n_rows x n_cols at position plt_idx
observable_data = data[experiment]['observables'][obs]
plot_data_series(ax, observable_data, face_color=face_color)
ax.set_title(f'{obs}')
ax.legend() # Show the legend for the subplot
fig.suptitle(f'{experiment}') # add a shared "Super"-title for the full figure
fig.tight_layout() # autoformat the grid layout to make it look nice
plot_data(data)
plt.show()
Modelling using the SUND toolbox¶
Define, simulate and plot a first model¶
The experimentalists have an idea of how they think the biological system they have studied works. We will now create a mathematical representation of this hypothesis.
About the hypothesis
To go from the first idea to formulate a hypothesis, we include what we know from the biological system:
- A receptor (R) is phosphorylated (Rp) in response to a stimulation (A)
- A downstream signalling molecule (S) is activated as an effect of the phosphorylation of the receptor
- The active downstream signalling molecule (Sp) is involved in the dephosphorylation of the receptor
- Phosphorylated/active molecules can return to inactive states without stimuli
Based on the behaviour in the data, and the prior knowledge of the system, the experimentalists have constructed the hypothesis depicted in the figure below.

From this hypothesis, we can draw an interaction graph, along with a description of the reactions.

The reactions are as follows:
- Reaction 1 is the phosphorylation of the receptor (R). It depends on the state R, with the input A and the parameter k1.
- Reaction 2 is the basal dephosphorylation of Rp to R. It depends on the state Rp with the parameter k2.
- Reaction 3 is the feedback-induced dephosphorylation of Rp to R exerted by Sp. This reaction depends on the states Rp and Sp with the parameter kfeed.
- Reaction 4 is the basal dephosphorylation of Sp to S. It depends on the state Sp with the parameter k4.
- Reaction 5 is the phosphorylation of S to Sp. This phosphorylation is regulated by Rp. The reaction depends on the states S and Rp, with the parameter k5.
Rp and R are what have been measured experimentally.
The experimentalist have guessed the values of both the initial conditions and the parameter values
| Initial values | Parameter values |
|---|---|
| R(0) = 1 | k1 = 1 |
| Rp(0) = 0 | k2 = 0.0001 |
| S(0) = 1 | kfeed = 1000000 |
| Sp(0) = 0 | k4 = 1 |
| k5 = 0.001 |
The next step now is to implement the hypothesis as a mathematical model, and then simulate the model using the SUND package.
About the SUND package and object-oriented programming
SUND uses an object-oriented approach. The workflow is:
- Write the model equations in a text file
- Compile the file into a model class:
sund.install_model(...) - Create a model object from that class, holding equations and parameter values:
sund.load_model(...) - Create a simulation object that runs the model and stores the results:
sund.Simulation(models=...)
You can find more details in the SUND documentation.
We will now go over how we go from the hypothesis to a model file, to a simulation object that reacts to an activity, and how to interact with the results from the simulation object.
Implement and explore the first hypothesis
This section will focus on details on how to simulate a model in practice. For clarity, we will create a model of the hypothesis listed above, and recreate the first experiment in the data set (dose1). Note that the actual testing of the model starts in the next section.
Defining the model equations
First, we must define the mathematical model in a format that the SUND toolbox understands. Start by creating a text file (the code in these instructions assumes the name M1.txt. If you name your file something else you need to account for this in your code), and use the following model formulation as a template. Update it to include all equations needed for the hypothesis. The VARIABLES heading contain the reaction rates.
We can get inputs into the model by defining them under the INPUTS heading, and we can use the OUTPUTS heading to define model outputs that can be shared to other models. In this example, we will use A as a model input since it is constant. We use @ 0 to mark that the default value of A is 0 if not given as an input.
To define the model output (measurement equation), we use the FEATURES heading.
Note that the text file must contain all headers below (beginning with ##########)
########## NAME
# Comments are made with '#' or '//
M1
########## METADATA
time_unit = m
authors = "Your name" # optional
########## MACROS
########## STATES
ddt_R = r2-r1 # You can also write d/dt(R) = ... if you prefer
ddt_Rp = r1-r2
ddt_S = 0 # TODO: add the equations
ddt_Sp = 0 # TODO: add the equations
R(0) = 1
Rp(0) = 0
S(0) = 1
Sp(0) = 0
########## PARAMETERS
k1 = 1
k2 = 0.0001
# TODO: add the rest of the parameters
########## VARIABLES
r1 = R*A*k1
r2 = Rp*k2
# TODO: add the rest of the variables
########## FUNCTIONS
########## EVENTS
########## OUTPUTS
########## INPUTS
A = A @ 0 # The internal variable A is taken from an input named A, has a default value of 0 if not provided
########## FEATURES
Rp = Rp
R = R
Install the model
To install your model so that it can be used by Python, you use the SUND package function .install_model("filename"). This will define a new model class which contains the mathematical model. This class can then be imported and used to create model objects. Now, install the model.
# Install the model by using the SUND function 'install_model'
# To verify that the model has been installed try running 'installed_models' afterwards
sund.install_model('M1.txt')
print(sund.installed_models())
Load the model object
Once a model has been installed it can be loaded to Python. To do this use the SUND function load_model(model_name) (with the model name given in the model file). The function will return a model object which should be stored in variable, say m1.
# Load the model by using the SUND function 'load_model'
# The function returns the model object which should be stored in a variable, say 'm1'
m1 = sund.load_model("M1")
Reinstalling a model that has already been loaded
If the model has already been loaded, Python must be restarted before (re)installing the model. This is mostly an issue if running the model from within a notebook. In that case, you must restart the kernel before reinstalling the model.
Create an activity object
In model M1, we declared that A should be an input. Now we need to define that input, which we do with an Activity object. Using the activity, we can provide an output which the model can use as input. In dose1 A is initially 0, changing to 1 at time point 5. We can define this with a piecewise_constant output.
# Create an activity object, which sets A to 1 at t=5.
activity = sund.Activity(time_unit='m')
activity.add_output(type="piecewise_constant", name="A", t = [5], f=[0, 1])
Note that an activity can have multiple outputs, which are added by using the .add_output() function multiple times. For example, if we wanted to add another input B which is set to 2 at time point 10, we would do it like this:
# This is an example, you do not need to add this to your script
activity = sund.Activity(time_unit='m')
activity.add_output(type="piecewise_constant", name="A", t = [5], f=[0, 1])
activity.add_output(type="piecewise_constant", name="B", t = [10], f=[0, 2])
You can find more information on the different output types and the activity object in the SUND documentation.
Create a simulation object
The simulation of models is carried out by a simulation object, which can include one or more model objects, and (optionally) one or more activity object. If you want to use multiple models/activity objects, put them in a list (e.g. [m1a, m1b]).
To create a simulation object, call the class constructor Simulation from the SUND package. You will need to provide the model objects that should be included when you do the constructor call.
# Create a new Simulation object using the model and activity you have defined previously.
m1_simulation = sund.Simulation(
models = m1,
activities= activity,
time_unit = m1.time_unit, # or explicitly with time_unit = 'm'
# time_vector = [...] # optional, can also be set later
)
The simulation object has a function .simulate() which is used to perform a simulation. To do this, a vector of desired time points (time_vector) must be defined. You can define the time vector for the simulation in three different ways:
Option 1: Set the time vector when creating the simulation
sund.Simulation(
model= ...,
time_unit = ...,
time_vector=[...]
)
Option 2: Set the time_vector to an already created simulation object
sim = sund.Simulation(...)
sim.time_vector = [...]
Option 3: Use the keyword when running the simulation
sim.simulate(time_vector=[...])
Using the time_unit keyword in simulate() changes the simulation time unit
You could potentially change the time unit by passing the time_unit keyword to simulate(). But, if you do, you will change the time unit of the simulation object. Which means that further simulations will run in this time unit. This is often not what you want and therefore not recommended. Instead, it is clearer to change the time unit of the simulation object manually before running the simulation:
sim = sund.Simulation(...)
sim.time_unit = 'h'
sim.simulate(time_vector=[...])
Simulate using the simulation object
Now it is time to run the simulation that you have created. Below, we pass the time vector directly to the simulate() function, but you can also set the time vector as an attribute of the simulation object before simulating if you want to.
# Simulate the m1_simulation object, for any of the experiments, remember to specify the time vector ([times to simulate]) and time_unit ('s' | 'm' | 'h' ).
time_points = np.linspace(0, 60, 1000) # 1000 time points between 0 and 60.
m1_simulation.simulate(time_vector=time_points)
The default simulation values are linked to the model objects
The default values for the simulation is linked to the model object that was used to create the simulation. This means that if you have changed the value of the model objects, the default value in the simulation object will also be changed.
# Example, you do not have to add this to your script
m1_sims['dose1'].reset_states()
print(f"States before change: {m1_sims['dose1'].state_values}")
m1.state_values = [1, 2, 3, 4]
m1_sims['dose1'].reset_states()
print(f"States after change: {m1_sims['dose1'].state_values}")
> States before change: [1, 0, 1, 0]
> States after change: [1, 2, 3, 4]
This is important to keep in mind, since the default behaviour of the simulate() function is to reset the parameter and state values to the default values before simulating (via the default reset=True flag). This means that if you have updated the model object with new parameter values, these new values will be used as default values in the simulation object, and the simulate() function will reset to these new values before simulating. You can also choose to not reset the parameter and state values before simulating by setting the reset=False flag in the simulate() function.
About using multiple models in the simulation
If you are using multiple model objects to create the simulation objects, these can be connected by using the ########## OUTPUTS and ########## INPUTS cells within the models. For example, the output of one model can be used as the input to another model.
You can find more information about how to connect models in the SUND documentation, see the Model modularity section for more details.
After you have simulated the m1_simulation object, the object will be updated. The state_values attribute now correspond to the final point in the simulation, and the time_vector attribute contains the time points used in the simulation, but more importantly we now have the simulated values of the measurements defined as features.
Observe the simulation result
The simulation object will store the simulation result for every feature specified in the model(s) used to create the simulation, i.e. everything specified under the ########## FEATURES cell. The result is stored in an attribute called feature_values where each column represent one feature, with the corresponding names in feature_names. There is also a convenience function .features_as_dict() which translates the features to a dictionary.
Print and/or plot the simulation result of the feature Rp for the times simulated above as you did before. Does it look like you would expect?
Using feature_values and feature_names
# Print the result of the simulation above
idx_Rp = m1_simulation.feature_names.index('Rp')
print(f"Values after first simulation: {m1_simulation.feature_values[:,idx_Rp]}")
print(f"feature_names: {m1_simulation.feature_names}")
Using features_as_dict()
# Print the result of the simulation above using the convenience function .features_as_dict()
features_dict = m1_simulation.features_as_dict()
print(f"Values after first simulation: {features_dict['Rp']}")
Running the same simulation with different parameter values
What we most often want to do is to change the values of the parameters, but keep the initial state values the same. We can use the simulation object to do this, by running two simulations with different values of the parameters. we do this by passing the parameter_values keyword.
Do two simulations with different parameter values. Print and/or plot the results. Does it behave as you expect?
# Run two simulations with different parameter values, but the same initial conditions.
m1_simulation.simulate(parameter_values=[1, 0.0001, 1000000, 1, 0.001])
print(f"Simulation state values with first parameter set: {m1_simulation.state_values}")
m1_simulation.simulate(parameter_values=[1, 1, 100, 1, 1])
print(f"Simulation state values with second parameter set: {m1_simulation.state_values}")
Now we have all the tools needed to proceed and simulate all the experiments and observables for the first hypothesis.
Simulating the experiments using the model for the first hypothesis¶
Simulate the first hypothesis using the provided parameter values, plot and compare to data
Now we can proceed to create the two simulation objects. A good approach is to store the simulations in a dictionary with the same name as in the data. Use the dose values from the data. Remember to set the time_unit of the simulation to minutes. You can either define the experiments manually, or more directly use the input definition in the data to automatically create the simulations.
Remember that we created a field in the data with all the time points for each experiment? We can use this field now to set the time vector for the simulations.
Manually creating the simulations
m1_sims = {} # Create an empty dictionary to store simulations in
# Setup the first experiment
activity_1 = sund.Activity(time_unit='m')
# TODO: Add outputs corresponding to the first experiment
m1_sims['dose1'] = sund.Simulation(
models = m1, # or what you named your model object to
activities = activity_1, # activity object for dose 1
time_unit = 'm',
time_vector = data['dose1']['all_time_points']
)
# Setup the second experiment
activity_2 = sund.Activity(time_unit='m')
# TODO: Add outputs corresponding to the second experiment
m1_sims['dose2'] = sund.Simulation(
models = m1, # or what you named your model object to
activities = activity_2, # activity object for dose 2
time_unit = 'm',
time_vector = data['dose2']['all_time_points']
)
Automatically creating the simulations from data
Since we structured our data in a clever way, it is possible to automatically construct the simulations to match the data, which is especially useful if you have many dataset. For more information, you can check the example in the SUND documentation.
Now, we can iterate over the data and automatically create both the activity and simulations.
m1_sims = {} # Create an empty dictionary
for experiment in data:
activity = sund.Activity(time_unit="m")
# Add outputs for each input in the data file
for inp in data[experiment]["input"]:
activity.add_output(
type = "piecewise_constant",
name=inp,
t = data[experiment]["input"][inp]['t'],
f = [0] + data[experiment]["input"][inp]['f'] # [0] is used to add a basal value to the output (not defined in this data file)
)
# Create the simulation for the experiment
m1_sims[experiment] = sund.Simulation(
models = m1, # or what you named your model object to
activities = activity,
time_unit = 'm',
time_vector = data[experiment]['all_time_points']
)
Next, we should simulate and plot the model for a given parameter set. We start by defining a function that takes the plot axis object, parameter values, simulation object, time_points, and the feature to plot as input, and plots the simulation. We also include a color argument (defaults to 'b' for blue) so change the color of the line easily.
# Define a function to plot the simulation
def plot_sim(ax, params, sim, time_points, feature_to_plot, color='b', label='Simulation'):
# Setup, simulate, and plot the model
sim.simulate(
time_vector = time_points,
parameter_values = params
)
# Extract simulation results as dict
sim_results = sim.features_as_dict()
ax.plot(sim.time_vector, sim_results[feature_to_plot], color, label=label)
Since we are simulating an experiment where we have data, we want to show them together. We can combine our plot_sim function that we just defined, with our plot_data_series function we created earlier to do this.
# Define a function to plot the simulations together with the data
def plot_sim_with_data(params, sims, data, color='b'):
for experiment in data:
fig = plt.figure() # create a figure for each experiment
n_rows, n_cols = find_subplot_layout(data[experiment])
time_high_resolution = np.linspace(
data[experiment]["all_time_points"][0], # from
data[experiment]["all_time_points"][-1], # to
1000 # with n=1000 steps
) # create a high resolution time vector for the simulation
# iterate over all observables
plt_idx = 0 # Instantiate a counter to keep track of the subplot index
for obs in data[experiment]['observables']:
plt_idx = plt_idx + 1 # Update the subplot index
ax = fig.add_subplot(n_rows, n_cols, plt_idx) # create a new subplot in grid n_rows x n_cols at position plt_idx
plot_sim(ax, params, sims[experiment], time_high_resolution, obs, color) # plot the simulation for the observable
plot_data_series(ax, data[experiment]['observables'][obs], face_color=color)
ax.set_title(f'{obs}')
ax.legend() # Show the legend for the subplot
fig.suptitle(f'{experiment}') # add a shared "Super"-title for the full figure
fig.tight_layout() # autoformat the grid layout to make it look nice
Since we have structured the simulations to mirror the datasets, we can iterate over the dataset experiments and extract the corresponding dataset and simulation. Since we have multiple observables, we also need to iterate over these. We create a high resolution time vector for the simulation to make it look nice. Lastly, we call the plot_sim() and plot_data_series() functions to plot the simulation and data together.
Finally, we use the newly defined function to plot both the model simulation and data.
param0_M1 = [1, 0.0001, 1000000, 1, 0.001] # = [k1, k2, kfeed, k4, k5]
plot_sim_with_data(param0_M1, m1_sims, data)
(spoilers) Self-check: Have I implemented the model/code correctly?
If you have done everything correctly, you should have two figures which looks like this for the first hypothesis with the given values:


The code should now look something like what is in the box below:
Code sync
#(all code from last sync)
# Install the model
sund.install_model('M1.txt')
print(sund.installed_models())
# Load the model object
m1 = sund.load_model("M1")
# Create the two simulation objects with corresponding activities from the data file
m1_sims = {} # Create an empty dictionary
for experiment in data:
activity = sund.Activity(time_unit="m")
# Add outputs for each input in the data file
for inp in data[experiment]["input"]:
activity.add_output(
type = "piecewise_constant",
name=inp,
t = data[experiment]["input"][inp]['t'],
f = [0] + data[experiment]["input"][inp]['f'] # [0] is used to add a basal value to the output (not defined in this data file)
)
# Create the simulation for the experiment
m1_sims[experiment] = sund.Simulation(
models = m1, # or what you named your model object to
activities = activity,
time_unit = 'm',
time_vector = data[experiment]['all_time_points']
)
# Define a function to plot the simulation
def plot_sim(ax, params, sim, time_points, feature_to_plot, color='b', label='Simulation'):
# Setup, simulate, and plot the model
sim.simulate(
time_vector = time_points,
parameter_values = params,
)
# Extract simulation results as dict
sim_results = sim.features_as_dict()
ax.plot(sim.time_vector, sim_results[feature_to_plot], color, label=label)
# Define a function to plot the simulations together with the data
def plot_sim_with_data(params, sims, data, color='b'):
for experiment in data:
fig = plt.figure() # create the figure
n_rows, n_cols = find_subplot_layout(data[experiment])
time_high_resolution = np.linspace(
data[experiment]["all_time_points"][0], # from
data[experiment]["all_time_points"][-1], # to
1000 # with n=1000 steps
) # create a high resolution time vector for the simulation
# iterate over all observables
plt_idx = 0 # Instantiate a counter to keep track of the subplot index
for obs in data[experiment]['observables']:
plt_idx = plt_idx + 1 # Update the subplot index
ax = fig.add_subplot(n_rows, n_cols, plt_idx) # create a new subplot in grid n_rows x n_cols at position plt_idx
plot_sim(ax, params, sims[experiment], time_high_resolution, obs, color) # plot the simulation for the observable
plot_data_series(ax, data[experiment]['observables'][obs])
ax.set_title(f'{obs}')
ax.legend() # Show the legend for the subplot
fig.suptitle(f'{experiment}') # add a shared "Super"-title for the full figure
fig.tight_layout() # autoformat the grid layout to make it look nice
# Plot the model with the given parameter values:
param0_M1 = [1, 0.0001, 1000000, 1, 0.001] # = [k1, k2, kfeed, k4, k5]
plot_sim_with_data(param0_M1, m1_sims, data)
Improving the agreement manually (if there is time)¶
If you have extra time, try to find a combination of parameter values that improves the agreement to data by manually changing the parameter values.
Checkpoint and reflection¶
This marks the end of the first section of the exercise. Observe your figure which shows how well the model agrees with the data. Remember to save the figure.
Questions to reflect over
- Why do we simulate with a time-resolution higher than the resolution of the data points?
- When is the agreement good enough?
- What could be possible reasons why not all states of the model are measured experimentally?
- (optional) Was it easy finding a better agreement? What if it was a more complex model, or if there were more data series?
Evaluating and improving the agreement to data¶
Clearly, the agreement to data (for the given parameter set) is not perfect. A question now arises: how good does the model simulation agree with data? We typically evaluate this using a cost function
About the cost function
We typically evaluate the model agreement using a sum or squared residuals, weighted with the uncertainty of the data, often referred to as the cost of the model agreement.
where \(v(\theta)\) is the cost given a set of parameters \(\theta\), \(y_t\) and \(SEM_t\) is experimental data (mean and SEM) at time \(t\), and \({\hat{y}}_t\) is model simulation at time \(t\). Here, \(y_t-\ {\hat{y}}_t\left(\theta\right)\), is commonly referred to as residuals, and represent the difference between the simulation and the data in each time point.
Implement a cost function
Define a function fcost(params, sims, data) that simulates your model for a set of parameters and returns the cost. Call the simulation object's simulate function from inside fcost to get the simulated values.
Calculate the cost using the weighted sum of squares. We use the numpy package (which we imported as np) to sum and weight the residuals.
Since we have two datasets with two observables we want to evaluate the model against, we need to do two simulations and add the costs together.
First, look at an explicit version (for illustration only), then we go through the generalizable version you should use.
Explicit version
Remember that the we have multiple datasets (dose1 and dose2) with multiple observables (Rp and R), with different time points for each observable? In the most explicit version, we need to do one simulation for each combination of dataset and observable:
# Example of an explicit version (you should not use this)
def fcost(params, sims, data):
cost = 0
sims["dose1"].simulate(
parameter_values = params,
time_vector = data["dose1"]['observables']['Rp']['time']
)
y_sim_1_Rp = sims["dose1"].features_as_dict()['Rp']
y_1_Rp = data["dose1"]['observables']['Rp']['mean']
sem_1_Rp = data["dose1"]['observables']['Rp']['sem']
cost += np.sum(np.square(((y_sim_1_Rp - y_1_Rp) / sem_1_Rp)))
sims["dose1"].simulate(
parameter_values = params,
time_vector = data["dose1"]['observables']['R']['time']
)
y_sim_1_R = sims["dose1"].features_as_dict()['R']
y_1_R = data["dose1"]['observables']['R']['mean']
sem_1_R = data["dose1"]['observables']['R']['sem']
cost += np.sum(np.square(((y_sim_1_R - y_1_R) / sem_1_R)))
sims["dose2"].simulate(
parameter_values = params,
time_vector = data["dose2"]['observables']['Rp']['time']
)
y_sim_2_Rp = sims["dose2"].features_as_dict()['Rp']
y_2_Rp = data["dose2"]['observables']['Rp']['mean']
sem_2_Rp = data["dose2"]['observables']['Rp']['sem']
cost += np.sum(np.square(((y_sim_2_Rp - y_2_Rp) / sem_2_Rp)))
sims["dose2"].simulate(
parameter_values = params,
time_vector = data["dose2"]['observables']['R']['time']
)
y_sim_2_R = sims["dose2"].features_as_dict()['R']
y_2_R = data["dose2"]['observables']['R']['mean']
sem_2_R = data["dose2"]['observables']['R']['sem']
cost += np.sum(np.square(((y_sim_2_R - y_2_R) / sem_2_R)))
return cost
As you can notice, we essentially do the same simulation and calculations four times. As you can imagine this scales badly if you have more datasets and observables. A better approach is to make this generalizable.
Since you stored your simulations with the same keys as the datasets in the data, you can easily map the simulations to the datasets and iterate over them. We will also utilize that we extract all unique time points for each dataset (with the all_time_points field) to reduce the number of simulations to one per dataset. Since the time points are different, we now need to find the indices that correspond to the actual time points in the data:
Generalized version
# Calculate the cost
def fcost(params, sims, data):
cost = 0
for dose in data.keys(): # iterate over all experiments
sims[dose].simulate(
time_vector = data[dose]['all_time_points'],
parameter_values = params
)
sim_results = sims[dose].features_as_dict()
for obs in data[dose]["observables"].keys(): # iterate over all observables
# find the indexes of the features values that match the data points
t_idx = np.searchsorted(sims[dose].time_vector, data[dose]['observables'][obs]['time'])
y_sim = np.array(sim_results[obs][t_idx])
y = data[dose]['observables'][obs]['mean']
sem = data[dose]['observables'][obs]['sem']
cost += np.sum(np.square(((y_sim - y) / sem)))
return cost
See? The generalizable version is much more compact than the explicit version, and it will work for any number of datasets and observables as long as they are structured in the same way as the data file we have created.
Before moving on, we need to handle 'problematic simulations', that sometimes occurs when simulating ODE models.
How to handle 'problematic' simulations
A simulation may crash due to numerical errors when solving the differential equations. Often, this is attributed to bad parameter values giving rise to extremely large or small derivatives. To avoid the whole script crashing due to such an error we need to put the Simulation call inside a try/except structure:
def fcost(params, sim, data):
cost = 0
for dose in data.keys():
try:
sim.simulate(...)
...
cost += ...
except Exception as e:
# CVODE errors correspond to issues in the simulation, which is common when testing parameter values.
# In these cases, we do not want the error to crash the script, so instead we return a large value of the cost
if "CVODE" in str(e):
cost+= 1e29
else:
raise
return cost
Here, the program will try executing everything inside the try-block but if an exception happen it will jump to the except-block.
Note: it is also possible that these errors arise if the model is incorrectly defined (e.g. if missing a degradation that depends on the state that is being degraded). If you ONLY get the high cost returned, and never any other values. The issue is likely with your model not just "random problematic parameters".
Calculate the cost of the initial parameter set
Now, that we have a cost function implemented, we can calculate the cost for the model. Use the given initial parameter values and your cost function (fcost) to calculate the cost. Finally, let's print the cost.
param0_M1 = [1, 0.0001, 1000000, 1, 0.001] # = [k1, k2, kfeed, k4, k5]
cost_M1 = fcost(param0_M1, m1_sims, data)
print(f"Cost of the M1 model: {cost_M1}")
Test a 'problematic' parameter set (which gives a simulation that crashes)
In some cases, some sets of parameter will yield numerical issues for some models, typically because the derivatives become very large/small. We can provoke this by simulating with negative parameter values (which you should never do), to test that your cost function handles this situation correctly.
Try to simulate with the parameter set that should cause the model to crash, e.g. with negative values: param_fail = [-1e16, -1e16, -1e16, -1e16, -1e16] and make sure that the cost function handles this properly. You should expect to get a cost around 2e29 returned - as we have 2 doses that fails (1e29 + 1e29). As we have handled errors that contain the text "CVODE", you will not get an error messages printed.
param_fail = [-1e16, -1e16, -1e16, -1e16, -1e16] # = [k1, k2, kfeed, k4, k5]
cost_fail = fcost(param_fail, m1_sims, data)
print(f"Cost of the failed parameter set: {cost_fail}")
(spoilers) Self-check: What should the cost for the initial parameter set be if the model is implemented correctly?
The cost for the initial parameter set should be approximately 1203.
Code sync
# (all code from last sync)
# Define and use the cost function
def fcost(params, sims, data):
cost=0
for dose in data:
try:
sims[dose].simulate(
time_vector = data[dose]['all_time_points'],
parameter_values = params
)
sim_results = sims[dose].features_as_dict()
for obs in data[dose]['observables'].keys(): # iterate over all observables
# find the indexes of the features values that match the data points
t_idx = np.searchsorted(sims[dose].time_vector, data[dose]['observables'][obs]['time'])
y_sim = np.array(sim_results[obs][t_idx])
y = np.array(data[dose]['observables'][obs]['mean'])
sem = np.array(data[dose]['observables'][obs]['sem'])
cost += np.sum(np.square(((y_sim - y) / sem)))
except Exception as e:
if "CVODE" in str(e):
cost+= 1e29
else:
raise
return cost
param0_M1 = [1, 0.0001, 1000000, 1, 0.001] # = [k1, k2, kfeed, k4, k5]
cost_M1 = fcost(param0_M1, m1_sims, data)
print(f"Cost of the M1 model: {cost_M1}")
param_fail = [-1e16, -1e16, -1e16, -1e16, -1e16] # = [k1, k2, kfeed, k4, k5]
cost_fail = fcost(param_fail, m1_sims, data)
print(f"Cost of the failed parameter set: {cost_fail}")
Evaluating if the agreement is good enough¶
Now that we have a quantitative measure of how well the model simulation agree with the experimental data, we can evaluate if the agreement is good enough, or if the model should be rejected.
About statistical tests
To get a quantifiable statistical measurement for if the agreement is good enough, or if the model/hypothesis should be rejected, we can use a \(\chi^2\)-test. We compare if the cost is exceeding the threshold set by the \(\chi^2\)-statistic. We typically do this using p = 0.05, and setting the degrees of freedom to the number of residuals summed (i.e. the number of data points).
If the cost exceeds the \(\chi^2\)-limit, we reject the model/hypothesis.
Compare the cost to a \(\chi^2\)-test
We now test if the simulation with the initial parameter set should be rejected using the \(\chi^2\)-test. We use the number of non-infinite SEMs as the degrees of freedom, since these correspond to the number of residuals summed in the cost function. We can use the ppf function from the chi2 distribution in the scipy.stats package to get the \(\chi^2\)-limit for a given p-value and degrees of freedom.
cost = fcost(param0_M1, m1_sims, data)
dgf=0
for dose in data:
for obs in data[dose]['observables'].keys():
dgf += np.count_nonzero(np.isfinite(data[dose]['observables'][obs]['sem']))
chi2_limit = chi2.ppf(0.95, dgf) # Here, 0.95 corresponds to 0.05 significance level
print(f"Cost of the M1 model with initial parameters: {cost}")
print(f"Chi2 limit: {chi2_limit}")
print(f"Cost > limit (rejected?): {cost>chi2_limit}")
(spoiler) Self-check: Should the model be rejected?
As suspected, the agreement to the data was too bad, and the model thus had to be rejected. However, that is only true for the current values of the parameters. Perhaps this structure of the model can still explain the data good enough with other parameter values?
To determine if the model structure (hypothesis) should be rejected, we must determine if the parameter set with the best agreement to data still results in a too bad agreement. If the best parameter set is too bad, no other parameter set can be good enough, and thus we must reject the model/hypothesis.
Code sync
# (all code from last sync)
# Test if the model with the initial parameter values must be rejected
cost = fcost(param0_M1, m1_sims, data)
dgf=0
for dose in data:
for obs in data[dose]['observables'].keys():
dgf += np.count_nonzero(np.isfinite(data[dose]['observables'][obs]['sem']))
chi2_limit = chi2.ppf(0.95, dgf) # Here, 0.95 corresponds to 0.05 significance level
print(f"Cost of the M1 model with initial parameters: {cost}")
print(f"Chi2 limit: {chi2_limit}")
print(f"Cost > limit (rejected?): {cost>chi2_limit}")
Improving the agreement¶
If you have time, try to change some parameter values to see if you can get a better agreement.
Improve the agreement (manually)
It is typically a good idea to change one parameter at a time to see the effect of changing one parameter. If you did this in the previous section, you can reuse those parameter values.
param_guess_M1 = [1, 0.0001, 1000000, 1, 0.001] # Change the values here and try to find a better agreement, if you tried to find a better agreement visually above, you can copy the values here
cost = fcost(param_guess_M1, m1_sims, data)
print(f"Cost of the M1 model: {cost}")
dgf=0
for dose in data:
for obs in data[dose]['observables'].keys():
dgf += np.count_nonzero(np.isfinite(data[dose]['observables'][obs]['sem']))
chi2_limit = chi2.ppf(0.95, dgf)
print(f"Cost of the M1 model with guessed parameters: {cost}")
print(f"Chi2 limit: {chi2_limit}")
print(f"Cost > limit (rejected?): {cost>chi2_limit}")
plot_sim_with_data(param_guess_M1, m1_sims, data)
plt.show()
Now, let us proceed utilize an optimization algorithm.
Improve the agreement using optimization methods
As you have probably noticed, estimating the values of the parameters can be a hard and time-consuming work. To our help, we can use optimization methods. There exists many toolboxes and algorithms (also referred to as optimizers) that can be used for parameter estimation. In this short example, we will use the Differential Evolution optimizers from the SciPy toolbox.
Prepare before running the parameter optimization
In its simplest mathematical expression, an optimization problem can look something like this:
where we minimize the objective function \(v(x)\) which evaluates the parameter values \(x\). The parameter values are constrained to be between the lower bound \(lb\) and the upper bound \(ub\).
The SciPy optimizers follow this general convention, using the following naming:
func- an objective function (we can use the cost functionfcostdefined earlier),bounds- bounds of the parameter values,args- arguments needed for the objective function, in addition to the parameter values, in our case model simulations and data:fcost(param, sims, data)x0- a start guess of the parameter values (e.g.param_guess_M1),callback- a "callback" function. The callback function is called at each iteration of the optimization, and can for example be used to save the current solution.disp- if the iterations should be printed or not (True/False)
Before you can start to optimize the parameters, we need to setup the required inputs (bounds, start guess and a callback). Since we will use this for most of the algorithms, we can setup these things first.
No guarantee
The optimization problem is not guaranteed to give the optimal solution, only better solutions. Therefore, you might need to restart the optimization a few times (perhaps with different starting guesses) to get a good solution.
Defining the callback function
This basic callback function prints the result to the output and also saves the current best solution to a temporary file.
def callback(x, file_name):
with open(f"./{file_name}.json",'w') as file:
out = {"x": x}
json.dump(out,file, cls=NumpyArrayEncoder)
Note that different solvers might have different callback definitions (e.g. callback(x), callback(x,f,c) or callback(x,f,c)). For this reason, and because we want to give the saved temp files a unique name later, we wrap the callback function with another function.
Setting up the start guess, bounds and necessary extra arguments
The bounds determine how wide the search for the parameter values would be, and the extra arguments are the additional arguments that the cost function needs besides the parameter values (in our case the simulations, and the data)
param_guess_M1 = [1, 0.0001, 1000000, 1, 0.001] # or the values you found manually
bounds_M1 = Bounds([1e-6]*len(param_guess_M1), [1e6]*len(param_guess_M1)) # Create a bounds object needed for some of the solvers
args_M1 = (m1_sims, data)
Improving the efficiency of the optimization by searching in log-space
Due to large differences in parameter bounds, and that in general the bounds are often wide, it is much more efficient to search in log-space, and something you should always do if possible.
In practice, we take the logarithm of the bounds and start guess, and return to linear space before simulating and/or saving the parameter values.
One trick is to wrap the cost function and callback function in another function which handles the transformation before passing the parameter values along to the cost function.
# param_guess_M1 = [1, 0.0001, 1000000, 1, 0.001] # or the values you found manually
args_M1 = (m1_sims, data)
param_guess_M1_log = np.log(param_guess_M1)
bounds_M1_log = Bounds([np.log(1e-6)]*len(param_guess_M1_log), [np.log(1e6)]*len(param_guess_M1_log)) # Create a bounds object needed for some of the solvers
def fcost_log(params_log, sims, data):
params = np.exp(params_log)
return fcost(params, sims, data)
def callback_log(x, file_name='M1-temp'):
callback(np.exp(x), file_name=file_name)
# res = Optimization(...) <--- Here we will run the optimization in the next step
res['x'] = np.exp(res['x']) # convert back the results from log-scale to linear scale
Running the optimization
Now that we have a cost function, a callback function, and have prepared for searching in log-space, it is time to run the optimization to find better parameter values.
# Setup for running optimization (in log-scale)
# param_guess_M1 = [1, 0.0001, 1000000, 1, 0.001] # or the values you found manually
param_guess_M1_log = np.log(param_guess_M1)
bounds_M1_log = Bounds([np.log(1e-6)]*len(param_guess_M1_log), [np.log(1e6)]*len(param_guess_M1_log)) # Create a bounds object needed for some of the solvers
args_M1 = (m1_sims, data)
def fcost_log(params_log, sims, data):
params = np.exp(params_log)
return fcost(params, sims, data)
def callback_log(x, file_name='M1-temp'):
callback(np.exp(x), file_name=file_name)
# Setup for the specific optimizer
def callback_M1_evolution_log(x,convergence):
callback_log(x, file_name='M1-temp-evolution')
res = differential_evolution(
func=fcost_log, bounds=bounds_M1_log, args=args_M1, x0=param_guess_M1_log, callback=callback_M1_evolution_log, disp=True
) # This is the optimization
res['x'] = np.exp(res['x'])
# Save the result
print(f"\nOptimized parameter values: {res['x']}")
print(f"Optimized cost: {res['fun']}")
print(f"chi2-limit: {chi2_limit}")
print(f"Cost > limit (rejected?): {res['fun']>chi2_limit}")
Computational time for different scales of model sizes
Optimizing model parameters in larger and more complex models, with additional sets of data, will take a longer time that in this exercise. In a "real" project, it likely takes hours if not days to find the best solution.
Saving the optimized parameter values
To avoid overwriting or losing optimized parameter sets, we save the parameter values to a file, with the model name and cost in the file name.
if res['fun'] < chi2_limit: # Saves the parameter values, if the cost was below the limit.
file_name = f"M1 ({res['fun']:.3f}).json"
with open(file_name,'w') as file:
json.dump(res, file, cls=NumpyArrayEncoder) # We save the file as a .json file, and for that we have to convert the parameter values that is currently stored as a ndarray into a regular list
Loading previously saved parameter sets
After the optimization, the results is saved into a JSON file.
If you later want to load the parameter set you can do that using the code snippet below. Remember to change file_name.json to the right file name. The loaded file will be a dictionary, where the parameter values can be accessed using res['x']
with open('file_name.json','r') as f: # Remember to update 'file_name' to the right name
res = json.load(f)
Plot and calculate the cost for the best solution found using the optimization
Now, plot the best solution found. Is the agreement good?
The code below will use the results from the latest optimization. However, you can also load a specific parameter set if you want to.
p_opt_M1 = res['x']
plot_sim_with_data(p_opt_M1, m1_sims, data)
print(f"Cost: {fcost(p_opt_M1, m1_sims, data)}")
(spoilers) Self-check: How good solution should I have found?
You should have found a cost around 41. If not, run the optimization a few more times. If you still do not find it, make sure that your model is implemented correctly.
Code sync
# (all code from last sync)
# Improving the agreement manually
param_guess_M1 = [1, 0.0001, 1000000, 1, 0.001] # Change the values here and try to find a better agreement, if you tried to find a better agreement visually above, you can copy the values here
cost = fcost(param_guess_M1, m1_sims, data)
print(f"Cost of the M1 model: {cost}")
dgf=0
for dose in data:
for obs in data[dose]['observables'].keys():
dgf += np.count_nonzero(np.isfinite(data[dose]['observables'][obs]['sem']))
chi2_limit = chi2.ppf(0.95, dgf)
print(f"Chi2 limit: {chi2_limit}")
print(f"Cost > limit (rejected?): {cost>chi2_limit}")
plot_sim_with_data(param_guess_M1, m1_sims, data)
# Improving the agreement using optimization methods
def callback(x, file_name):
with open(f"./{file_name}.json",'w') as file:
out = {"x": x}
json.dump(out,file, cls=NumpyArrayEncoder)
# param_guess_M1 = [1, 0.0001, 1000000, 1, 0.001] # or the values you found manually
args_M1 = (m1_sims, data)
# Setup for searching in log-scale
param_guess_M1_log = np.log(param_guess_M1)
bounds_M1_log = Bounds([np.log(1e-6)]*len(param_guess_M1_log), [np.log(1e6)]*len(param_guess_M1_log)) # Create a bounds object needed for some of the solvers
def fcost_log(params_log, sims, data):
params = np.exp(params_log)
return fcost(params, sims, data)
def callback_log(x, file_name='M1-temp'):
callback(np.exp(x), file_name=file_name)
# Setup for the specific optimizer
def callback_M1_evolution_log(x,convergence):
callback_log(x, file_name='M1-temp-evolution')
res = differential_evolution(
func=fcost_log, bounds=bounds_M1_log, args=args_M1, x0=param_guess_M1_log, callback=callback_M1_evolution_log, disp=True
) # This is the optimization
res['x'] = np.exp(res['x'])
# Save the result
print(f"\nOptimized parameter values: {res['x']}")
print(f"Optimized cost: {res['fun']}")
print(f"chi2-limit: {chi2_limit}")
print(f"Cost > limit (rejected?): {res['fun']>chi2_limit}")
if res['fun'] < chi2_limit: # Saves the parameter values, if the cost was below the limit.
file_name = f"M1 ({res['fun']:.3f}).json"
with open(file_name,'w') as file:
json.dump(res, file, cls=NumpyArrayEncoder) # We save the file as a .json file, and for that we have to convert the parameter values that is currently stored as a ndarray into a regular list
# Plot the simulation given the optimized parameters
p_opt_M1 = res['x']
plot_sim_with_data(p_opt_M1, m1_sims, data)
plt.show()
Checkpoint and reflection¶
Observe the plot with the best agreement to data, and the cost for the same parameter set. Do you reject the model? Briefly describe the cost function: What are the residuals? Why do we normalize with the squared uncertainty?
Questions to reflect over
- Why do we want to find the best solution?
- Are we guaranteed to find the best solution?
- Do you reject the first hypothesis?
- If you did not reject the model, how can you test the model further?
Prediction based model testing¶
Estimating the model uncertainty¶
About how to gather model uncertainty
If the optimization did not encounter big issues you should have been able to find a solution that passed the \(\chi^2\)-test. However, it is still possible that we have not found the right model and/or parameter values. For example, since the data have uncertainty it is possible that we have over-fitted to uncertain (incorrect) data. An over-fitted model would in general be good at explaining the data used for parameter estimation, but might not be good at explaining new data (i.e. making predictions).
For this reason, we do not only want to find the parameter set with the best agreement to data, but rather all parameter sets that agree with the estimation data. Since collecting all sets of parameter values is infeasible from a computational standpoint, we typically try to find the parameter sets that result in the simulations which gives the extreme (maximal/minimal) simulations in each time point.
The best way of finding the extremes is to directly optimize the value by reformulating the optimization problem, or to use Markov-Chain Monte-Carlo (MCMC) methods. Here, we will not do this due to time-constraints, and will instead use a simple (inferior) approach that simply stores a bunch of parameter sets that result in a cost below the threshold of rejection. This gives us a quick approximation of the uncertainty, which is good enough for this exercise. In future projects, you should try to find the maximal uncertainty.
Create an adapted cost function that also saves acceptable parameter sets
To save all parameters, we need to create a function which both calculates the cost, and stores acceptable parameters. We reuse fcost, and introduce a new variable all_params_M1 to save the acceptable parameter to.
all_params_M1 = [] # Initiate an empty list to store all parameters that are below the chi2 limit
def fcost_uncertainty_M1(param_log, sims, data, dgf):
param = np.exp(param_log) # Convert from log-space to linear space
cost = fcost(param, sims, data) # Calculate the cost using fcost
# Save the parameter set if the cost is below the chi2 limit
if cost < chi2.ppf(0.95, dgf):
all_params_M1.append(param)
return cost
Run the optimization a couple of times to get values
Next, we need to run the optimization a couple of times to get some acceptable parameter values.
args_M1_uncertainty = (m1_sims, data, dgf)
for i in range(0,5):
res = differential_evolution(
func=fcost_uncertainty_M1, bounds=bounds_M1_log, args=args_M1_uncertainty,
x0=param_guess_M1_log, callback=callback_M1_evolution_log
) # This is the optimization
param_guess_M1_log = res['x'] # Update starting point for next run
print(f"Number of parameter sets collected: {len(all_params_M1)}") # Prints how many parameter sets were accepted
Make sure that there are at least 250 saved parameter sets. If you want to, you can save the found parameters to a file:
Saving and loading the parameter sets
If you want to save your discovered parameter sets, the simplest approach is to save them to a .csv file. This is not strictly necessary to do now, but it is good to do if you want to be able to recreate the results later. To save the values, you need to open a file and write to it:
# Save the accepted parameters to a csv file
with open('all_params_M1.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file, delimiter=',')
writer.writerows(all_params_M1)
Then you can load the file to get back the list of parameter sets. If you load the parameter sets from a previous run, you can disable the optimization to save time.
# load the accepted parameters from the csv file
with open('all_params_M1.csv', newline='') as csv_file:
reader = csv.reader(csv_file, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)
all_params_M1 = list(reader)
Plot the model uncertainty
To save some time, we randomize the order of the saved parameter sets, and only plot the first 200. Obviously this results in an underestimated simulation uncertainty, but the uncertainty should still be sufficient for this exercise. In a real project, one would simulate all collected parameter sets (or sample using the reformulated optimization problem, or MCMC).
Lets create a function which will shuffle the collected parameter sets, and then plot the selected parameter sets for both of the experimental data.
def plot_uncertainty(all_params, sims, data, color='b', n_params_to_plot=500):
random.shuffle(all_params)
for experiment in data:
fig = plt.figure() # create the figure
n_rows, n_cols = find_subplot_layout(data[experiment])
time_high_resolution = np.linspace(
data[experiment]["all_time_points"][0], # from
data[experiment]["all_time_points"][-1], # to
1000 # with n=1000 steps
) # create a high resolution time vector for the simulation
# iterate over all observables
plt_idx = 0 # Instantiate a counter to keep track of the subplot index
for obs in data[experiment]['observables']:
plt_idx = plt_idx + 1 # Update the subplot index
ax = fig.add_subplot(n_rows, n_cols, plt_idx) # create a new subplot in grid n_rows x n_cols at position plt_idx
# Now we plot n_params_to_plot simulations for each observable
for param in all_params[:n_params_to_plot]:
plot_sim(ax, param, sims[experiment], time_high_resolution, obs, color, label='')
plot_data_series(ax, data[experiment]['observables'][obs])
ax.set_title(f'{obs}')
ax.legend() # Show the legend for the subplot
fig.suptitle(f'{experiment}') # add a shared "Super"-title for the full figure
fig.tight_layout() # autoformat the grid layout to make it look nice
Finally, call the plot_uncertainty function with your collected parameter values
plot_uncertainty(all_params_M1, m1_sims, data)
(spoiler) Self-check: Did I implement the prediction correctly?
For the uncertainty step your code should look something like below.
Code sync
# (all code from previous sync)
# Setup for saving acceptable parameters
all_params_M1 = [] # Initiate an empty list to store all parameters that are below the chi2 limit
def fcost_uncertainty_M1(param_log, sims, data, dgf):
param = np.exp(param_log)
cost = fcost(param, sims, data)
# Save the parameter set if the cost is below the limit
if cost < chi2.ppf(0.95, dgf):
all_params_M1.append(param)
return cost
# Run the optimization a few times to save some parameters
args_M1_uncertainty = (m1_sims, data, dgf)
for i in range(0,5):
res = differential_evolution(
func=fcost_uncertainty_M1, bounds=bounds_M1_log, args=args_M1_uncertainty, x0=param_guess_M1_log, callback=callback_M1_evolution_log
) # This is the optimization
param_guess_M1_log = res['x'] # update starting parameters
print(f"Number of parameter sets collected: {len(all_params_M1)}") # Prints now many parameter sets that were accepted
# Save the accepted parameters to a csv file
with open('all_params_M1.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file, delimiter=',')
writer.writerows(all_params_M1)
# Plot the uncertainty
plot_uncertainty(all_params_M1, m1_sims, data)
Model validation¶
Finally, now that we have estimated the model uncertainty, we can validate the model by making predictions of new data with uncertainty.
About the prediction
A prediction can almost be anything, but in practice it is common to use a different input strength (A in the model), or to change the input after a set time. It is also possible to extend the simulated time beyond the measured time points. Finally, it is also possible to use the simulation in between already measured data points.
In this case, we will now use the third data set that we excluded earlier data_validation. This data is a step response, where A was set to 1 at t = 5 and then A was set to 2 at t = 60.
Plot the validation data
# Test the model against the evaluation experiment
plot_data(data_validation, face_color='w')
Define validation simulation object
First define the new activity corresponding to the validation input.
# define validation activity
activity_validation_M1 = sund.Activity(time_unit='m')
activity_validation_M1.add_output(
type="piecewise_constant",
name= "A",
t = data_validation['step_dose']['input']['A']['t'],
f=[0] + data_validation['step_dose']['input']['A']['f']
)
Define a simulation object using this activity.
m1_validation = {}
m1_validation['step_dose'] = sund.Simulation(
models = m1, activities = activity_validation_M1,
time_unit = 'm', time_vector = data_validation['step_dose']['all_time_points']
)
Plot your prediction, the original data, and the new validation data together
Now, to plot your prediction, the original data, and the new data together, we slightly alter the plot function. Note that we set the facecolor to none for the validation data to more easily differentiate it from the original data.
# Define the plot_sim_with_data function
def plot_validation_with_data(all_params, sims, data_validation, data, color='b', n_params_to_plot=500):
random.shuffle(all_params)
for experiment in data_validation:
fig = plt.figure()
n_rows, n_cols = find_subplot_layout(data_validation[experiment])
time_high_resolution = np.linspace(
data_validation[experiment]["all_time_points"][0], # from
data_validation[experiment]["all_time_points"][-1], # to
1000 # with n=1000 steps
) # create a high resolution time vector for the simulation
# iterate over all observables
plt_idx = 0 # Instantiate a counter to keep track of the subplot index
for obs in data_validation[experiment]['observables']:
plt_idx = plt_idx + 1 # Update the subplot index
ax = fig.add_subplot(n_rows, n_cols, plt_idx) # create a new subplot in grid n_rows x n_cols at position plt_idx
# Now we plot n_params_to_plot simulations for each observable
for param in all_params[:n_params_to_plot]:
plot_sim(ax, param, sims[experiment], time_high_resolution, obs, color=color, label='')
plot_data_series(ax, data_validation[experiment]['observables'][obs], face_color='none', label='Validation Data')
plot_data_series(ax, data["dose1"]['observables'][obs], face_color='k')
ax.set_title(f'Validation - {obs}')
ax.legend() # Show the legend for the subplot
fig.suptitle(f'{experiment}') # add a shared "Super"-title for the full figure
fig.tight_layout() # autoformat the grid layout to make it look nice
plot_validation_with_data function to plot the validation simulation with data.
plot_validation_with_data(all_params_M1, m1_validation, data_validation, data, color='b')
Calculate the cost for the prediction
While we can, most likely, draw conclusions based on the visual agreement, we should also calculate the cost. Calculate the cost for prediction of the data_validation using the best parameter found.
cost_m1_validation = fcost(p_opt_M1, m1_validation, data_validation)
dgf_validation=0
for experiment in data_validation:
for obs in data_validation[experiment]['observables']:
dgf_validation += np.count_nonzero(np.isfinite(data_validation[experiment]['observables'][obs]['sem']))
chi2_limit_validation = chi2.ppf(0.95, dgf_validation)
print(f"Cost (validation): {cost_m1_validation}")
print(f"Chi2 limit: {chi2_limit_validation}")
print(f"Cost > limit (rejected?): {cost_m1_validation>chi2_limit_validation}")
Did your model predict the new data correctly?
(spoiler) Self-check: Did I implement the prediction correctly?
If you did the prediction correctly, you should see something that looks like this:

Code sync
# (all code from previous sync)
# Plot model prediction with the validation data
def plot_validation_with_data(all_params, sims, data_validation, data, color='b', n_params_to_plot=500):
random.shuffle(all_params)
for experiment in data_validation:
fig = plt.figure()
n_rows, n_cols = find_subplot_layout(data_validation[experiment])
time_high_resolution = np.linspace(
data_validation[experiment]["all_time_points"][0], # from
data_validation[experiment]["all_time_points"][-1], # to
1000 # with n=1000 steps
) # create a high resolution time vector for the simulation
# iterate over all observables
plt_idx = 0 # Instantiate a counter to keep track of the subplot index
for obs in data_validation[experiment]['observables']:
plt_idx = plt_idx + 1 # Update the subplot index
ax = fig.add_subplot(n_rows, n_cols, plt_idx) # create a new subplot in grid n_rows x n_cols at position plt_idx
# Now we plot n_params_to_plot simulations for each observable
for param in all_params[:n_params_to_plot]:
plot_sim(ax, param, sims[experiment], time_high_resolution, obs, color=color, label='')
plot_data_series(ax, data_validation[experiment]['observables'][obs], face_color='none', label='Validation Data')
plot_data_series(ax, data["dose1"]['observables'][obs], face_color='k')
ax.set_title(f'Validation - {obs}')
ax.legend() # Show the legend for the subplot
fig.suptitle(f'{experiment}') # add a shared "Super"-title for the full figure
fig.tight_layout() # autoformat the grid layout to make it look nice
plot_validation_with_data(all_params_M1, m1_validation, data_validation, data, color='b')
# Calculate the model agreement to the validation data
cost_m1_validation = fcost(p_opt_M1, m1_validation, data_validation)
dgf_validation=0
for experiment in data_validation:
for obs in data_validation[experiment]['observables'].keys():
dgf_validation += np.count_nonzero(np.isfinite(data_validation[experiment]['observables'][obs]['sem']))
chi2_limit_validation = chi2.ppf(0.95, dgf_validation)
print(f"Cost (validation): {cost_m1_validation}")
print(f"Chi2 limit: {chi2_limit_validation}")
print(f"Cost > limit (rejected?): {cost_m1_validation>chi2_limit_validation}")
Testing an alternative hypothesis¶
As you should see, the first hypothesis did not predict the new data well. You will now test an alternative second hypothesis where A is not constant and is instead removed over time.
About the second hypothesis

This hypothesis is similar to the first one, with three changes: 1. The feedback-induced dephosphorylation of Rp (reaction 3 in the first hypothesis) is removed. 2. A is now a state. 3. Add feedback-dependent breakdown of A. This reaction depends on Sp and A with parameter kfeed (same name, different role).
The resulting interaction graph is:

The experimentalists also provide initial values and parameter guesses:
| Initial values | Parameter values |
|---|---|
| R(0) = 1 | k1 = 5 |
| Rp(0) = 0 | k2 = 20 |
| S(0) = 1 | kfeed = 10 |
| Sp(0) = 0 | k4 = 15 |
| A(0) = 0 | k5 = 30 |
Since A is now a state, we can no longer use an activity output to define the input. Instead, we need to define an activity that manipulates the state A directly. Also note that we now have one additional initial condition.
Start by implementing the new model.
You can either continue to work in our main.py file, or create a duplicate and work in that file for M2.
Test the alternative hypothesis
You have previously implemented and analysed one hypothesis. Now do the full analysis for this second hypothesis on your own, reusing your first workflow as much as you prefer.
Suggested workflow:
- Create
M2.txtand implement the model equations. - Install and load model
M2. - Create simulation objects for the estimation datasets.
- Optimize the model parameters (use the same bounds setup as for
M1). - Compare the best cost against the \(\chi^2\)-limit.
- If the model passes, collect uncertainty and test prediction on the validation data.
For this exercise, use the same bounds as in M1: linear bounds [1e-6, 1e6] for each parameter (or equivalently the same log-space bounds if optimizing in log-space).
Key changes of the model compared to M1:
- The extra initial condition needed (since A is now a state)
- No input in the model file (no A = A @ 0 in the input section)
- Changing the activity from an
outputto astate manipulation
Define the activity using state manipulations
Instead of providing a direct value of A as we did for the first model, we will now instead modify the value of A. This is done by using a state_manipulation instead of an output. This state manipulation will set the value of A to 1 at t=5.
# Example, you will need to add state manipulations with the settings of the two doses
activity_M2 = sund.Activity(time_unit='m')
activity_M2.add_state_manipulation(
mode = "add",
name = "A",
t = [5],
f = [1]
)
Notice that in contrast to the piecewise constant output, we do not need to provide a default value before the manipulation happens. Since A is now a state, it will always have a value before the manipulation occurs.
You can find more information about state manipulations in the SUND documentation.
You will need to create a new set of simulations for model M2 that uses the new activities with state manipulations instead of outputs. You can (and should) use the information in the data to create your activities and simulations.
Then test if M2 can explain the estimation data, and if so, if it can correctly predict the validation data.
What is your conclusion? Do you reject this alternative model?
(spoilers) Self-check: How can I know if M2 have been implemented correctly?
The cost for the given parameter set should be approximately 4646. If it isn't, you might have incorrect equations, or have placed the states or parameters in the incorrect order. It is also possible that you are simulating with parameter values from the wrong model.
Visually, the agreement for the given parameter set should look like this:


For the prediction, the model should have a clear peak for the second stimulation. If you don't get the second peak, make sure that you are using the correct activity.
Using the model to predict unknown scenarios¶
With a validated model we can now use M2 to predict unknown scenarios. A scenario of interest for the biologists would be how the elimination of A is affected by introducing an inhibitor that targets the stimulating interaction of Sp. This can now be tested by introducing this effect into our model.
About inhibitions
Inhibitions are often harder to model than activations. An activation is usually represented as a positive contribution (e.g. multiplication by an activating state), while an inhibition can be implemented in several different ways depending on the assumed mechanism.
Common implementation choices include:
- Dividing with the inhibitor:
1/inhibitor. This can become problematic if the inhibitor can be 0, so other approaches are often used to avoid this problem, such as1/(1+inhibitor)or1/(1 + k*inhibitor), wherekis a scaling factor for the inhibition. - As a multiplicative factor:
inhibitorwith values between 0 and 1, where 0 means full inhibition and 1 means no inhibition:A*Sp*inhibitor. Here, inhibitor could be a function depending on the input concentration:inhibitor = f(inhibitor dose). This function could for example be a Hill function.
It is also important to consider (experimentally) if the inhibitor reduces the state it is acting on, or if it is just reducing the effect that the state has on another process.
As always, it is import to model the biological functions of the inhibitions as correctly as possible. In this scenario, we will assume that the inhibitor reduces the effect of Sp on the elimination of A, but does not reduce the level of Sp itself. Also, that it can be modelled as a direct inhibition by dividing with the inhibitor concentration.
Details about the new prediction/experiment

Here, an inhibitor has been introduced to the system. The suggested effect of the inhibitor is that it inhibits the effect of Sp on the elimination of A. This could be illustrated as the following interaction-graph:

Here, the inhibition is not absolute. Rather, there is an increasing effectiveness of increasing the concentration of the inhibitor. To easier visualize the effect, the inhibitor should be introduced when the second dose of the training data occurs t = 60 and have no effect during the first response.
The inhibition could in a simple way be interpreted as:
r3 = kfeed*A*Sp*(1/inhibitor)
Start by implementing the new model.
Implement an inhibition to the elimination of A
Note that this model structure is essentially the same as the second hypothesis, but just adds a new input.
Start by creating a copy of your M2 text file. We assume that you name this new model M2_inhib.
The inhibitory substance could be called inhibitor and can be introduced either as a state or input.
Define the inhibition activity
We will use the same experimental setup as in the step_dose experiment, but with the addition of the inhibitor. In other words, you need to provide:
- 1) The input of
A - 2) The input of the
inhibitor.
You can either add two outputs/manipulations to one activity object or create to different activities (multiple activities can be provided to the simulation: sund.Simulation(models=..., activities=[act1, act2])).
You are free to model the inhibitor as either an input or a state. Make sure to pick the corresponding activity type for outputs (piecewise_constant) or a state manipulation (with mode='set'). You can assume that the value of inhibitor is set to 100 at t=60. To avoid division with 0 (and to have no effect if there is no inhibitor added), the initial value the inhibitor(0) = 1 - regardless of your choice.
Plot the model uncertainty using the inhibited model
Implement a new plot function that takes the following inputs
- all_params (a list of parameter values)
- sims (a dictionary containing simulation object(s))
- time_points (the time vector to simulate)
- color='b' (a default value for line color)
- n_params_to_plot=500 (the number of parameters to plot)
The function should take the gathered parameter values for the second hypothesis and simulates the inhibited version with these parameters. The prediction should be shown for both Rp and R.
Hint
You can reuse the code from the previous plot functions, but you will need to change the simulation object to the inhibited version, and also make sure to plot both Rp and R. Also make sure that you update the legends and titles.
What does the prediction of the inhibition effect tell you?
(spoilers) Self-check: How can I know if the inhibition effect has been implemented correctly?
If implemented correctly, the Rp and R features should look something like this.
Checkpoint and reflection¶
Observe the figure with the prediction with uncertainty from both of the models, and how well they agree with the data. Are you able to reject any of the models based on this figure?
Questions to reflect over
- Did you reject any of the models?
- What are the strengths of finding core predictions that are different between models potentially describing the same system?
- What is the downside to choosing points where there is an overlap between the models to test experimentally, even if the predictions are certain ("narrow")?
- What would be the outcome if it was impossible (experimentally) to measure between 2 and 3 minutes?
- Is our simulation uncertainty reasonable? Is it over- or underestimated?
- What could be problems related to overestimating the uncertainty? To underestimating the uncertainty?
- What should you do when your model can correctly predict new data?
And some questions on the whole exercise to reflect over
- Look back at the lab, can you give a quick recap of what you did?
- Why did we square the residuals? And why did we normalize with the SEM?
- Why is it important that we find the "best" parameter sets when testing the hypothesis? Do we always have to find the best?
- Can we expect to always find the best solution for our type of problems?
- What are the problems with underestimating the model uncertainty? And with overestimating?
- If the uncertainty is very big, how could you go about to reduce it?
- Why do we want to find core predictions when comparing two (or more) models? What are the possible outcomes if we did the experiment?
- What are the benefits/drawbacks with activity outputs vs. activity state manipulations?
The end of the exercise¶
In the exercise, you should have learned:
- the basics of implementing and simulating models
- how to test the models using a \(\chi^2\)-test
- a basic approach to collecting and visualizing model uncertainty
- how to test the model by predicting independent validation data
You should now know the basics of what you need to do a systems biology project
This marks the end of the exercise. But, we have also included some additional steps below, which might be relevant for some projects. You can do them now, or come back here later when it is relevant.
Additional exercises (optional)¶
These are optional steps that you can try. They are not mandatory right now, but might be relevant in your future projects. Even if you do not do them now, remember that you can come back and try this out later.
Test different optimizers¶
Test different optimizers
As mentioned earlier, Scipy offers multiple optimizers. You can see some of them below. As the optimizers are all implementation of different heuristics, they are not equally good at all problems. Different optimizers are better suited for different types of problems. Below you find some examples of different optimizers, test them and compare how well they perform. Since the optimizers are all implemented slightly differently, you need to make some changes to how the functions are called, e.g. in regards to the callback.
Differential evolution
# Optimize using differential_evolution
def callback_evolution_M1(x, convergence): # The callback to the differential_evolution optimizer returns both the current point x, and state of convergence
callback_log(x, file_name='M1-temp-evolution')
result_evolution = differential_evolution(fcost_log, bounds_M1_log, args=args_M1, callback=callback_evolution_M1, disp=True)
Dual annealing
Dual annealing uses a slightly different callback function. Therefore, we need to define a new one, and keep track of the number of iterations done (niter).
# Optimize using dual annealing
def callback_dual(x,f,c):
global niter
if niter%1 == 0:
print(f"Iter: {niter:4d}, obj:{f:3.6f}", file=sys.stdout)
callback(np.exp(x), file_name = 'M1-temp-dual.json')
niter+=1
niter=0
result_dual = dual_annealing(fcost_log, bounds_M1_log, x0=param_guess_M1_log, args=args_M1, callback=callback_dual)
SHGO (simplicial homology global optimization)
SHGO has no display property. It is possible to modify the callback to print the iteration if desired
# Optimize using SHG optimization
def callback_shgo_M1(x):
global niter
if niter%1 == 0:
print(f"Iter: {niter}")
callback_log(x, file_name='M1-temp-SHGO')
niter+=1
niter=0
result_shgo = shgo(fcost_log, bounds_M1_log, args=args_M1, callback=callback_shgo_M1, options = {"disp":True})
Direct
# Optimize using DIRECT
def callback_direct(x):
global niter
if niter%1 == 0:
print(f"Iter: {niter}")
callback_log(x, file_name='M1-temp-DIRECT')
niter+=1
niter=0
result_direct = direct(fcost_log, bounds_M1_log, args=args_M1, callback=callback_direct)
Minimize
# Optimize using local solvers
method = 'TNC' # 'COBYLA', 'TNC', 'L-BFGS'
def callback_local(x):
global niter
if niter%1 == 0:
print(f"Iter: {niter}")
callback_log(x, file_name=f'M1-temp-{method}')
niter+=1
niter=0
result_local = minimize(fcost_log, x0=param_guess_M1_log, method=method, bounds=bounds_M1_log, args=args_M1, callback=callback_local)
Making a profile-likelihood plot¶
About profile-likelihood
In a profile-likelihood analysis, we plot the value of one parameter on the x-axis and the cost of the whole parameter set on the y-axis. Such a curve is typically referred to as a profile-likelihood curve. To get the full range of values for the parameter, it is common practice to fixate the parameter being investigated, by removing it from the list of parameters being optimized, and setting it explicitly in the cost function, and re-optimize the rest of the parameters. This is done several times, and each time the parameter is fixed to a new value, spanning a range to see how the cost is affected from changes in the value of this specific parameter (Figure 4). If the resulting curve is defined and passes a threshold for rejection in both the positive and negative direction (Figure 4A), we say that the parameter is identifiable (that is, we can define a limited interval of possible values), and if instead the cost stays low in the positive and/or the negative direction, we say that the parameter is unidentifiable (Figure 4B-D). This whole procedure is repeated for each model parameter.
Figure 4: Examples of identifiable (A) and unidentifiable parameters (B-D). The blue line is the threshold for rejection, the red line is the value of the cost function for different values of the parameter.
In a prediction profile-likelihood analysis, predictions instead of parameter values are investigated over the whole range of possible values. Predictions cannot be fixated, like parameters can. Therefore, the prediction profile-likelihood analysis is often more computationally heavy to perform than the profile-likelihood analysis. Instead of fixation, a term is added to the cost to force the optimization to find a certain value of the prediction, while at the same time minimizing the residuals. Resulting in an objective function something like the following:
Again, the whole range of values of the predictions are iterated through. A prediction profile-likelihood analysis can be used in the search for core predictions.
Do a profile-likelihood analysis of the parameters
Here, we will not fixate one parameter at time and re-optimize, instead we will use the already collected data. This is worse than fixing the parameters, but we will do it for the sake of time. Furthermore, as opposed to the uncertainty plots before, we do not have to select some parameter sets, we can instead use all of our collected sets. The easiest way to do a profile-likelihood plot, is to simply plot the cost against the values of one parameter. Then repeat for all parameters.
To make it slightly more pretty, one can filter the parameters to only give the lowest cost for each parameter value. This can be done by first sorting the parameters according to the overall cost, using the function numpy.argsort, and then using the function numpy.unique. A good idea is likely to round your parameter values to something with fewer decimals. This can be done with numpy.round. Note that numpy.round takes an optional argument: decimals which specify the number of decimal to store, can be a negative number.