\n",
"**Background (click me)**

\n",
"\n",
"The experimental group is focusing on the cellular response to growth factors, like the epidermal growth factor (EGF), and their role in cancer. The group have a particular interest in the phosphorylation of the receptor tyrosine kinase EGFR in response to EGF and therefore perform an experiment to elucidate the role of EGFR. In the experiment, they stimulate a cancer cell line with EGFR and use an antibody to measure the tyrosine phosphorylation at site 992, which is known to be involved in downstream growth signaling. The data from these experiments show an interesting dynamic behavior with a fast increase in phosphorylation followed by a decrease down to a steady-state level which is higher than before stimulation (Figure 1). \n",
"\n",
" \n",
"\n",
"![Figure1](https://isbgroup.eu/edu/assets/Intro/Figures/Data-example.jpg)\n",
"\n",
"*Figure 1: The experimental data. A cancer cell line is stimulated with EGF at time point 0, and the response at EGFR-pY992 is measured*\n",
"\n",
"The experimental group does not know the underlying mechanisms that give rise to this dynamic behavior. However, they have two different ideas of possible mechanisms, which we will formulate as hypotheses below. The first idea is that there is a negative feedback transmitted from EGFR to a downstream signaling molecule that is involved in dephosphorylation of EGFR. The second idea is that a downstream signaling molecule is secreted to the surrounding medium and there catalyzes breakdown of EGF, which also would reduce the phosphorylation of EGFR. The experimental group would like to know if any of these ideas agree with their data, and if so, how to design experiments to find out which is the more likely idea. \n",
"\n",
"To generalize this problem, we use the following notation: \n",
"EGF – Stimulation (S) \n",
"EGFR – Receptor (R)\n",
"\n",
"The next step now is to set up the experimental data. \n",
"

"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "2XkdKw8kjQj1"
},
"source": [
"#### Defining the data\n",
"Typically, datasets are saved in files and loaded into the analysis script. In this case, we have inserted the values corresponding to the experimental data here: time points, mean values and standard deviations of the mean (SEM). \n",
"\n",
"The data is structured in what is called a dictionary (dict). In short, a dictionary contains a pair of keys and values. Below, we print the keys of the dict and the values of one key (time). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RayLChdNj7LI"
},
"outputs": [],
"source": [
"data = {} # create an empty dictionary\n",
"data['time'] = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0]\n",
"data['mean'] =[-0.0065112, 0.076965, 0.060161, 0.050973, 0.03768, 0.027949, 0.034981, 0.028261, 0.028708, 0.019014, 0.021887]\n",
"data['SEM'] = [0.0027889, 0.0035225, 0.0045911, 0.0051342, 0.0034731, 0.0036044, 0.0038831, 0.0019989, 0.0047651, 0.0036907, 0.0018121]\n",
"\n",
"print('All content of data:')\n",
"print(data)\n",
"print('\\nKeys in the data:')\n",
"print(data.keys())\n",
"print('\\nThe content of the \"time\" key:')\n",
"print(data[\"time\"])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "82YEwp_vhXx0"
},
"source": [
"### Plotting the data\n",
"Now that we have loaded the data, we should visualize it. \n",
"We do this by plotting the data using the package `matplotlib` and `pyplot`. In practice, we have shortened the name of `matplotlib.pyplot` package to `plt` when we imported the package earlier (`import matplotlib.pyplot as plt`). We plot the data using the `errorbar` function, and set the `color` to black, and the `marker` symbol to \"o\". \n",
"\n",
"To make the code reusable for later, we wrap the plotting inside a function `plot_data` which takes data as input. Finally, we plot the data and show the figure using the plot function"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jrnoo5vfnIVz"
},
"outputs": [],
"source": [
"def plot_data(data):\n",
" plt.errorbar(data['time'], data['mean'], data['SEM'], linestyle='None', marker='o', color='black', capsize=5)\n",
" plt.xlabel('Time (min)')\n",
" plt.ylabel('Phosphorylation (a.u.)')\n",
"\n",
"plot_data(data)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "CdYeLxY1r7_W"
},
"source": [
"### Define, simulate and plot a first model\n",
"\n",
"The experimentalists have an idea of how they think the biological system they have studied works. We will now create a mathematical representation of this hypothesis. \n",
" \n",
"\n",
"## (spoilers) Self-check: Have I implemented the model correctly?

\n",
"\n",
"If you have done everything correctly, you should have a figure which looks like this for the first hypothesis with the given values: \n",
"\n",
"![H1-start-agreement](https://isbgroup.eu/edu/assets/Intro/Figures/scipy-H1-start-param-agreement.png)\n",
"

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Improving the agreement manually (if there is time)\n",
"\n",
"If you have extra time, try to find a combination of parameter values that improves the agreement to data. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"param_manual_M1 = [1, 0.0001, 1000000, 1, 0.01] # = [k1, k2, kfeed, k4, k5]\n",
"\n",
"plot_simulation(M1, param_manual_M1, ic_M1, t_hr, state_to_plot=1) \n",
"plot_data(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Checkpoint and reflection\n",
"This marks the end of the first section of the exercise. Show your figure which shows how well the model agrees with the data. \n",
"\n",
"\n",
"**Questions to reflect over**

\n",
"\n",
"* Why do we simulate with a time-resolution higher than the resolution of the data points? \n",
"* When is the agreement good enough? \n",
"* What could be possible reasons why not all states of the model are measured experimentally? \n",
"* (optional) Was it easy finding a better agreement? What if it was a more complex model, or if there were more data series? \n",
"\n",
"

"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "pBGUm-b0Fd6y"
},
"source": [
"***\n",
"\n",
"## Evaluating and improving the agreement to data\n",
"Clearly, the agreement to data (for the given parameter set) is not perfect, but it is still better than the first overly simple model. However, a question now arises: *how good* does the model simulation agree with data? \n",
"\n",
"We typically evaluate the model agreement using a sum or squared residuals, weighted with the uncertainty of the data, often referred to as the *cost* of the model agreement. \n",
"\n",
"$$v\\left(\\theta\\right)=\\ \\sum_{\\forall t}\\left(\\frac{{y_t -\\ {\\hat{y}}_t\\left(\\theta\\right)}}{{SEM}_t}\\right)^2$$\n",
"\n",
"where $v(\\theta)$ is the cost given a set of parameters $\\theta$, $y_t$ and $SEM_t$ is experimental data (mean and SEM) at time $t$, and ${\\hat{y}}_t$ is model simulation at time $t$. Here, $y_t-\\ {\\hat{y}}_t\\left(\\theta\\right)$, is commonly referred to as *residuals*, and represent the difference between the simulation and the data in each timepoint.\n",
"\n",
"The cost function is defined in the code cell below. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xF7vr97sIAj7"
},
"outputs": [],
"source": [
"def fcost(param, model, ic, data):\n",
" t = data[\"time\"]\n",
" \n",
" sim = odeint(model, ic, t, (param,))\n",
" ysim = sim[:,1]\n",
"\n",
" y = np.array(data[\"mean\"])\n",
" sem = np.array(data[\"SEM\"])\n",
" \n",
" cost = np.sum(np.square((y - ysim) / sem))\n",
" return cost"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "xYaOMuSYJWQL"
},
"source": [
"### Calculating the cost of the initial parameter set\n",
"\n",
"Now, let's calculate the cost for the model, given the initial parameter values, using the cost function defined above (`fcost`). Finally, let's print the cost to the output (by using `print` and an [f-string](https://realpython.com/python-f-strings/#f-strings-a-new-and-improved-way-to-format-strings-in-python))."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8dibOfnyJThY"
},
"outputs": [],
"source": [
"param0_M1 = [1, 0.0001, 1000000, 1, 0.01] # = [k1, k2, kfeed, k4, k5]\n",
"ic_M1 = [1, 0, 1, 0] # = [R(0), Rp(0), S(0), Sp(0)]\n",
"\n",
"cost_M1 = fcost(param0_M1, M1, ic_M1, data)\n",
"print(f\"Cost of the M1 model: {cost_M1}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## (spoilers) Self-check: What should the cost for the initial parameter set be if the model is implemented correctly?

\n",
"\n",
"The cost for the initial parameter set should be approximately 701\n",
"\n",
"

"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "8TpaMim_w7sO"
},
"source": [
"### Evaluating if the agreement is good enough\n",
"To get a quantifiable statistical measurement for if the agreement is good enough, or if the model/hypothesis should be rejected, we can use a $\\chi^2$-test. We compare if the cost is exceeding the threshold set by the $\\chi^2$-statistic. We typically do this using p = 0.05, and setting the degrees of freedom to the number of residuals summed (i.e. the number of data points). \n",
"\n",
"If the cost exceeds the $\\chi^2$-limit, we reject the model/hypothesis. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9aDKCO4dx7TO"
},
"outputs": [],
"source": [
"dgf = len(data[\"mean\"])\n",
"chi2_limit = chi2.ppf(0.95, dgf) # Here, 0.95 corresponds to 0.05 significance level\n",
"print(f\"Chi2 limit: {chi2_limit}\")\n",
"print(f\"Cost > limit (rejected?): {cost_M1>chi2_limit}\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "HzGoV7BhzY9j"
},
"source": [
"### Improving the agreement (manually)\n",
"As suspected, the agreement to the data was too bad, and the model thus had to be rejected. However, that is only true for the current values of the parameters. Perhaps this *structure* of the model can still explain the data good enough with other parameter values? To determine if the model structure (hypothesis) should be rejected, we must determine if the parameter set with the *best* agreement to data still results in a too bad agreement. If the best parameter set is too bad, no other parameter set can be good enough, and thus we must reject the model/hypothesis.\n",
"\n",
"Try to change some parameter values to see if you can get a better agreement. It is typically a good idea to change one parameter at a time to see the effect of changing one parameter. \n",
"If you did this in the previous section, you can reuse those parameter values. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_HwMDTNL0KZU"
},
"outputs": [],
"source": [
"param_guess_M1 = [1, 0.0001, 1000000, 1, 0.01] # Change the values here and try to find a better agreement\n",
"# param_guess_M1 = param_manual # If you tried to find a better agreement visually above, you can copy the values here\n",
"ic_M1 = [1, 0, 1, 0] \n",
"t_hr = np.arange(0, 2, 0.01)\n",
"\n",
"cost = fcost(param_guess_M1, M1, ic_M1, data)\n",
"print(f\"Cost of the M1 model: {cost}\")\n",
"\n",
"dgf = len(data[\"mean\"])\n",
"chi2_limit = chi2.ppf(0.95, dgf)\n",
"print(f\"Chi2 limit: {chi2_limit}\")\n",
"print(f\"Cost > limit (rejected?): {cost>chi2_limit}\")\n",
"\n",
"plot_simulation(M1, param_guess_M1, ic_M1, t_hr, state_to_plot=1)\n",
"plot_data(data)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "HioIt0ET_FEN"
},
"source": [
"### Improving the agreement (using optimization methods)\n",
"\n",
"As you have probably noticed, estimating the values of the parameters can be a hard and time-consuming work. To our help, we can use optimization methods. There exists many toolboxes and algorithms that can be used for parameter estimation. In this short example, we will use an algorithm called [dual annealing from the SciPy toolbox](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.dual_annealing.html). It is called in the following way: `res = dual_annealing(func, bounds, args, x0, callback)`\n",
"\n",
"\n",
"## Description of the dual annealing function arguments

\n",
"\n",
"* `func` - an objective function (we can use the cost function `fcost` defined earlier), \n",
"* `bounds` - bounds of the parameter values, \n",
"* `args` - arguments needed for the objective function, in addition to the parameter values to test (in our case model, initial conditions and data: `fcost(param, model, ic, data)`)\n",
"* `x0` - a start guess of the parameter values (e.g. `param_guess_M1`), \n",
"* `callback` - a \"callback\" function. The callback function is called at each iteration of the optimization, and can for example be used to print the current solution. \n",
"\n",
"

\n",
"\n",
"The dual annealing function returns a dictionary (`res`) with the keys `x` corresponding to the best found parameter values, and `fun` which corresponds to the cost for the values of `x`."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "Zt595-YXpoem"
},
"source": [
"#### Defining the callback function:\n",
"\n",
"This basic callback function prints the result to the output and also saves the current best solution to a temporary file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-dwcpwG6pndI"
},
"outputs": [],
"source": [
"def callback_M1(x,f,c):\n",
" global niter\n",
" if niter%1 == 0:\n",
" print(f\"Iter: {niter:4d}, obj:{f:3.6f}\", file=sys.stdout)\n",
" with open(\"./M1-temp.json\",'w') as file:\n",
" out = {\"f\": f, \"x\": list(x)}\n",
" json.dump(out,file)\n",
" niter+=1"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "ysfOv1Tbpx4B"
},
"source": [
"#### Running the optimization\n",
"Now it is time to run the optimization to find better parameter values. \n",
"\n",
"The optimization problem is not guaranteed to give the optimal solution, therefore you might need to restart the optimization a few times (perhaps with different starting guesses) to get a good solution. \n",
"\n",
"\n",
"\n",
"## Additional notes about optimization and saving/loading parameter values (click me)

\n",
"\n",
"##### Problematic simulations\n",
"\n",
"Note that you might sometimes get warnings about `odeint` issues. If this only rarely happens, it is a numerical issue and nothing to worry about, but if it is the *only* thing happens, and you get no progress, something is wrong. \n",
"\n",
"##### Saving/loading parameters\n",
"\n",
"After the optimization, the results is saved into a JSON file. Remember, if you run on Google Colab, the files will not be saved when you disconnect from the server. So, if you want to save a parameter set permanently, download the parameter set. \n",
"\n",
"If you later want to load the parameter set you can do that using the code snippet below. Remember to change `file_name.json` to the right file name. The loaded file will be a dictionary, where the parameter values can be accessed using `results['x']` \n",
"\n",
"```python\n",
"with open('file_name.json','r') as file:\n",
" res = json.load(f)\n",
"```\n",
"\n",
"##### Computational time for different scales of model sizes\n",
"\n",
"Optimizing model parameters in larger and more complex models, with additional sets of data, will take a longer time that in this exercise. In a \"real\" project, it likely takes hours if not days to find the best solution. \n",
"\n",
"

\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bounds_M1 = np.array([(1.0e-6, 1.0e6)]*(len(param_guess_M1)))\n",
"args_M1 = (M1, ic_M1, data)\n",
"niter = 0\n",
"\n",
"res = dual_annealing(func=fcost, bounds=bounds_M1, args=args_M1, x0=param_guess_M1, callback=callback_M1) # This is the optimization\n",
"\n",
"print(f\"\\nOptimized parameter values: {res['x']}\")\n",
"print(f\"Optimized cost: {res['fun']}\")\n",
"print(f\"chi2-limit: {chi2_limit}\")\n",
"print(f\"Cost > limit (rejected?): {res['fun']>chi2_limit}\")\n",
"\n",
"if res['fun'] < chi2_limit: #Saves the parameter values, if the cost was below the limit. \n",
" file_name = f\"M1 ({res['fun']:.3f}).json\"\n",
" with open(file_name,'w') as file:\n",
" res['x'] = list(res['x']) #We save the file as a .json file, and for that we have to convert the parameter values that is currently stored as a ndarray into a traditional python list\n",
" json.dump(res, file) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## (spoilers) Self-check: How good solution should I have found?

\n",
"\n",
"You should have found a cost that is less than 18. If not, run the optimization a few more times. If you still do not find it, make sure that your model is implemented correctly.\n",
"

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's plot the solution you found using the optimization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plot_simulation(M1, res['x'], ic_M1, t_hr, state_to_plot=1)\n",
"plot_data(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Checkpoint and reflection\n",
"Show the plot with the best agreement to data, and the cost for the same parameter set. Do you reject the model?\n",
"Briefly describe the cost function: What are the residuals? Why do we normalize with the squared uncertainty? \n",
"\n",
"**If running in Google Colab: remember to save the best solution you have found before exiting the session!**\n",
"\n",
"\n",
"**Questions to reflect over**

\n",
"\n",
"* Why do we want to find the best solution? \n",
"* Are we guaranteed to find the best solution? \n",
"* Do you reject the first hypothesis? \n",
"* If you did not reject the model, how can you test the model further? \n",
"\n",
"

"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "G37UdMYLrv70"
},
"source": [
"***\n",
"\n",
"## Using the model to make predictions\n",
"\n",
"If the optimization did not encounter big issues you should have been able to find a solution that passed the $\\chi^2$-test. However, it is still possible that we have not found the right model and/or parameter values. For example, since the data have uncertainty it is possible that we have overfitted to uncertain (incorrect) data. An overfitted model would in general be good at explaining the data used for parameter estimation, but might not be good at explaining new data (i.e. making predictions). \n",
"\n",
"For this reason, we do not only want to find the parameter set with the best agreement to data, but rather *all* parameter sets that agree with the estimation data. Since collecting all sets of parameter values is infeasible from a computational standpoint, we typically try to find the parameter sets that result in the *simulations* which gives the extreme (maximal/minimal) simulations in each time point. \n",
"\n",
"The best way of finding the extremes is to directly optimize the value by reformulating the optimization problem, or to use Markov-Chain Monte-Carlo (MCMC) methods. Here, we will not do this due to time-constraints, and will instead use a simple (inferior) approach that simply stores a bunch of parameter sets that result in a cost below the threshold of rejection. This gives us a quick approximation of the uncertainty, which is good enough for this exercise. In future projects, you should try to find the maximal uncertainty.\n",
"\n",
"To save all parameters, we need to create a function which both calculates the cost, and stores acceptable parameters. We reuse `fcost`, and introduce a new variable `all_params_M1` to save the acceptable parameter to. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "zbZevkTW5Jjf"
},
"outputs": [],
"source": [
"all_params_M1 = [] # Initiate an empty list to store all parameters that are below the chi2 limit\n",
"def fcost_uncertainty_M1(param, model, ic, data):\n",
"\n",
" cost = fcost(param, model, ic, data) # Calculate the cost using fcost\n",
"\n",
" # Save the parameter set if the cost is below the limit\n",
" if cost < chi2.ppf(0.95, dgf):\n",
" all_params_M1.append(param)\n",
" return cost"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "70Snzj-S6ZwC"
},
"source": [
"Then we rerun the optimization a couple of times to get some values saved. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BEDWzu496mA0"
},
"outputs": [],
"source": [
"niter = 0\n",
"res = dual_annealing(func=fcost_uncertainty_M1, bounds=bounds_M1, args=args_M1, x0=param_guess_M1, callback=callback_M1) # This is the optimization\n",
"print(f\"Number of parameter sets collected: {len(all_params_M1)}\") # Prints how many parameter sets that were accepted"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Make sure that there are at least 250 saved parameter sets. \n",
"\n",
"You can save the found parameter sets by e.g. saving them to a CSV file. This file can then be loaded if you are returning later. \n",
"\n",
"The code cell below contains code for both saving and loading a set of acceptable parameters for reference. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Save the accepted parameters to a csv file\n",
"with open('all_params_M1.csv', 'w', newline='') as csvfile:\n",
" writer = csv.writer(csvfile, delimiter=',')\n",
" writer.writerows(all_params_M1)\n",
"\n",
"# load the accepted parameters from the csv file\n",
"with open('all_params_M1.csv', newline='') as csvfile:\n",
" reader = csv.reader(csvfile, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)\n",
" all_params_M1 = list(reader)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "PiQsm2xm77_Z"
},
"source": [
"Now, we plot the simulation uncertainty. \n",
"\n",
"To save some time, we randomize the order of the saved parameter sets, and only plot the first 200. Obviously this results in an *underestimated* simulation uncertainty, but the uncertainty should still be sufficient for this exercise. In a real project, one would simulate *all* collected parameter sets (or sample using the reformulated optimization problem, or MCMC). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "EZisNm1k7Hah"
},
"outputs": [],
"source": [
"random.shuffle(all_params_M1)\n",
"for param in all_params_M1[:200]:\n",
" plot_simulation(M1, param, ic_M1, t_hr, state_to_plot=1)\n",
"plot_data(data)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "rLmEkNrYqlrE"
},
"source": [
"### Testing the model further by making predictions\n",
"\n",
"Finally, now that we have estimated the model uncertainty, we can make predictions with uncertainty. A prediction can almost be anything, but in practice it is common to use a different input strength (`A` in the model), or to change the input after a set time. It is also possible to extend the simulated time beyond the measured time points. Finally, it is also possible to use the simulation in between already measured data points. \n",
"\n",
"In this exercise, we suggest that you change the input strength `A` to `2` at time point `t = 2`, and that you simulate 2 additional minutes (to `t = 4`). Firstly, this requires that we update the model to change the input. We have supplied a template with the updated input strength handling. You should copy your model of hypothesis 1 used above. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def M1_pred(state,t, param):\n",
" \n",
" if t >= 2: \n",
" A = 2\n",
" elif t >= 0:\n",
" A= 1\n",
" \n",
" # Update below with your M1 model. \n",
" \n",
" # Define the states\n",
" R = state[0]\n",
" Rp = state[1]\n",
" \n",
" # Define the parameter values \n",
" k1 = param[0]\n",
" k2 = param[1]\n",
"\n",
" # Define the reactions\n",
" r1 = R*A*k1\n",
" r2 = Rp*k2\n",
"\n",
" # Define the ODEs\n",
" ddt_R = -r1\n",
" ddt_Rp = -r2\n",
" ddt_S = 0\n",
" ddt_Sp = 0\n",
"\n",
" return[ddt_R, ddt_Rp, ddt_S, ddt_Sp]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "kIhGFcSw-UMD"
},
"source": [
"Finally, we need to extend the simulated time vector from ending at `t = 2`, to ending at `t = 4`, and then simulate the prediction using the sampled parameter sets. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7GGuA4dk-qeY"
},
"outputs": [],
"source": [
"t_hr_4 = np.arange(0, 4, 0.01)\n",
"\n",
"for param in all_params_M1[:200]:\n",
" plot_simulation(M1_pred, param, ic_M1, t_hr_4, state_to_plot=1)\n",
"plot_data(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The experimentalists have now measured your prediction (of setting `A = 2` at `t = 2`), and are presenting the following additional prediction data to you."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_pred = {}\n",
"data_pred['time'] = [2.25, 2.5, 2.75, 3.0, 3.25, 3.5, 3.75, 4.0]\n",
"data_pred['mean'] = [0.130148, 0.070192, 0.059823, 0.038678, 0.034251, 0.032175, 0.029674, 0.026648]\n",
"data_pred['SEM'] = [0.022427, 0.012864, 0.009188, 0.008876, 0.008798, 0.008431, 0.008488, 0.008232]\n",
"\n",
"plot_data(data_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, plot your prediction, the original data and the new data together. Did your model predict the new data correctly? "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"t_hr_4 = np.arange(0, 4, 0.01)\n",
"\n",
"for param in all_params_M1[:200]:\n",
" plot_simulation(M1_pred, param, ic_M1, t_hr_4, state_to_plot=1)\n",
"plot_data(data)\n",
"plot_data(data_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing an alternative hypothesis\n",
"\n",
"As you should see, the first hypothesis did not predict the new data well. Therefore, you have gone back to the biologists with whom you come up with an alternative second hypothesis: that the activator A is not constant and will instead be removed over time. Implement this alternative hypothesis (given below), check if it can explain the original data, and if it can, collect the uncertainty and check if it can predict the new data correctly.\n",
"\n",
"\n",
"## The second hypothesis (click me!)

\n",
"\n",
"![Interaction graph 2](https://isbgroup.eu/edu/assets/Intro/Figures/sund_model_2.png)\n",
"\n",
"This hypothesis is similar to the first one, but with some modifications: \n",
"1. The feedback-induced dephosphorylation of **Rp** (reaction 3 in the first hypothesis) is removed. \n",
"2. A new feedback-dependent breakdown of the activator **A** is added. This reaction is dependent on the states **Sp** and **A** with a parameter **kfeed** (same name, but a different mode of action). \n",
"\n",
"These differences yield the following interaction-graph:\n",
"\n",
"![Interaction graph 2](https://isbgroup.eu/edu/assets/Intro/Figures/sund_model_2_interactionGraph.png)\n",
"\n",
"For the alternative hypothesis, the experimentalist have also guessed the values of both the initial conditions and the parameter values.\n",
"\n",
"|Initial values|Parameter values|\n",
"|:------------|:--------------|\n",
"|R(0) = 1 |k1 = 5 |\n",
"|Rp(0) = 0 |k2 = 20 |\n",
"|S(0) = 1 |kfeed = 10 |\n",
"|Sp(0) = 0 |k4 = 15 |\n",
"|A(0) = 1 |k5 = 30 |\n",
"\n",
" \n",
"\n",
"

\n",
"\n",
"Note that `A` is now a state and **not** a constant input. \n",
"\n",
"Start by implementing the new model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Implement a model of the second hypothesis here\n",
"def M2(state,t, param):\n",
" \n",
" # Update below with your M2 model. \n",
"\n",
" # Define the states\n",
" R = state[0]\n",
" Rp = state[1]\n",
"\n",
" A = state[4] # A should be the last state in the model for the rest of the code to work properly\n",
"\n",
" # Define the parameter values \n",
" k1 = param[0]\n",
" k2 = param[1]\n",
"\n",
" # Define the reactions\n",
" r1 = R*A*k1\n",
" r2 = Rp*k2\n",
"\n",
" # Define the ODEs\n",
" ddt_R = -r1\n",
" ddt_Rp = -r2\n",
" ddt_S = 0\n",
" ddt_Sp = 0\n",
" ddt_A = 0\n",
"\n",
" return[ddt_R, ddt_Rp, ddt_S, ddt_Sp, ddt_A]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then test the supplied parameters, and plot the agreement to data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test how well the model fits the original data given the good parameter set (is the model rejected?)\n",
"## Calculate the cost of the new model, test with a chi2 test if the model is rejected or not\n",
"## Plot the agreement to data\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## (spoilers) Self-check: How can I know if M2 have been implemented correctly?

\n",
"\n",
"The cost for the given parameter set should be 905. If it isn't, you might have incorrect equations, or have placed the states or parameters in the incorrect order. It is also possible that you are simulating with parameter values from the wrong model. \n",
"\n",
"

\n",
"\n",
"Implement the callback, and create a new function for saving the parameter values"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Implement a new callback, remember to switch the names from M1 to M2\n",
"\n",
"# Setup for estimating the model uncertainty\n",
"## Initialize an empty list to store all accepted parameter sets\n",
"## Define a new cost function that saves all accepted parameter sets in the list\n",
"## Setup bounds and arguments for the optimization\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, collect the parameter sets by optimizing the parameter values a couple of times"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Run the optimization to gather the uncertainty, remember to run for M2 and to update all input arguments to the optimizer. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because `A` is now a state, we cannot set the values based on the time as done in `M1_pred`. Instead, we need to perform two consecutive simulations, where A is set at the start of each simulation. We do this using the `plot_prediction_M2` function defined below. \n",
"\n",
"Note that the time vector should now be split into two parts, `t1` and `t2`, ranging from 0 to 2 and from 2 to 4 respectively. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def plot_prediction_M2(model, param, ic, t1, t2, state_to_plot=1, line_color = 'b'):\n",
" \n",
" sim1 = odeint(model, ic, t1, (param,)) # Simulate the model for the initial parameters in the model file\n",
" \n",
" ic2 = sim1[-1,:] # The initial conditions for the second simulation is the last state of the first simulation\n",
" ic2[4] = 2 # The value of A is set to 2 for the second simulation\n",
" \n",
" sim2 = odeint(model, ic2, t2, (param,)) # Simulate the model for the second stimulation\n",
" \n",
" t = np.concatenate((t1,t2)) # Combine the time points from the two simulations\n",
" sim = np.concatenate((sim1[:,state_to_plot], sim2[:,state_to_plot])) # Combine the two simulations\n",
"\n",
" plt.plot(t, sim, color=line_color) # The data corresponds to Rp which is the second state"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sample the parameters and plot the prediction with the original data and the new validation data. Does this model agree with the prediction dataset? "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Sample the parameter sets, plot the simulations with uncertainty, and plot the prediction data\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Present the results of your predictions for the experimentalists\n",
"Before going to the final meeting with the experimentalists, you should prepare a figure which contain both of the model predictions and both of the data sets in one single figure, so that you can explain clearly which hypotheses should be rejected and which should be investigated further."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Sample the parameter sets, plot the simulations with uncertainty for both models, as well as both of the data sets\n",
"\n",
"## Plot model 1\n",
"## Plot model 2 (remember to select a different color by changing the line_color argument)\n",
"## Plot estimation data\n",
"## Plot prediction data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Checkpoint and reflection\n",
"\n",
"Present the figure with the prediction with uncertainty from both of the models, and how well they agree with the data. Explain if you should reject any of the models based on this figure. \n",
"\n",
"\n",
"**Questions to reflect over**

\n",
"\n",
"* Did you reject any of the models? \n",
"* What are the strengths of finding core predictions that are different between models potentially describing the same system?\n",
"* What is the downside to choosing points where there is an overlap between the models to test experimentally, even if the predictions are certain (\"narrow\")? \n",
"* What would be the outcome if it was impossible (experimentally) to measure between 2 and 3 minutes? \n",
"* Is our simulation uncertainty reasonable? Is it over- or underestimated? \n",
"* What could be problems related to overestimating the uncertainty? To underestimating the uncertainty?\n",
"* What should you do when your model can correctly predict new data? \n",
"\n",
"

"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "xAd4xli--3jH"
},
"source": [
"## Summary and final discussion\n",
"This concludes this introductory exercise to systems biology. In the exercise, you should have learned the basics of implementing and simulating models, testing the models using a $\\chi^2$-test, collecting uncertainty and testing the model by predicting independent validation data. \n",
"\n",
"### Checkpoint and reflection\n",
"\n",
"Discuss the questions below with a supervisor. Think if you have any additional questions regarding the lab or systems biology and feel free to discuss this with a supervisor\n",
"\n",
"\n",
"**Some summarizing questions to discuss**

\n",
"\n",
"1. Look back at the lab, can you give a quick recap of what you did?\n",
"2. Why did we square the residuals? And why did we normalize with the SEM?\n",
"3. Why is it important that we find the \"best\" parameter sets when testing the hypothesis? Do we always have to find the *best*?\n",
"4. Can we expect to always find the best solution for our type of problems?\n",
"5. What are the problems with underestimating the model uncertainty? And with overestimating?\n",
"6. If the uncertainty is very big, how could you go about to reduce it?\n",
"7. Why do we want to find core predictions when comparing two (or more) models? What are the possible outcomes if we did the experiment?\n",
"\n",
"

"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 0
}