Minerva: An Open-Source Library for Bioprocesses

_images/logo.png

Minerva is a relational database with (i) bio-kinetic models; (ii) process and modeling information; (iii) and an expert system.

Key Features & Capabilities

  • Fast and flexible. Minerva have basic building blocks to design and simulate a bioprocess.

  • Understandable data. The data is collected in an open standard file format, called JSON. It uses human-readable text to store and transmit data.

  • Standarized construction of models. The evaluation of thousands of biorefinery designs is streamlined through smart and efficient management of biorefinery parameters to evaluate sets of design decisions and scenarios.

  • Accesible for all and to all. Several first principle models are inside this database. Some of the most importants are: the aerobic growth of Saccharomyces cerevisia or Escherichia coli or the anaerobic growth of Mannheimia succiniciproducens.

Why we do this…

_images/flowering_data.png

How we do this…

_images/Ontology_v1.png

Model library

Elements

Exciting documentation in here. Let’s make a list (empty surrounding lines required):

  • Why

  • Parameters

  • Process conditions

    • Configuration

    • Mode of operation

    • Oxygen conditions

    • Temperature

    • pH

    • Product

    • Limiting substrate

    • Processes involved

    • Type of microorganisms

    • Filamentous growth

  • Type of model

  • Literature reference


Example

Parameter library

Elements

  • Parameter name

  • Standard nomenclature

  • SI Units

  • Type
    • Biotic

    • Abiotic


Example

  • Mass transfer coefficient for oxygen

  • kla

  • 1/h

  • Type
    • Abiotic

Problem library

Elements

  • Description

  • Variable affected

  • Effect

  • Probability

  • Solution


Example

  • Cooling system is not working

  • “CoolingOn”

  • {“CoolingOn”:0}

  • 10%

  • Turn on the cooling water

Problem library

Elements

  • Description

  • Variable affected

  • Effect

  • Probability

  • Solution


Example

  • Cooling system is not working

  • “CoolingOn”

  • {“CoolingOn”:0}

  • 10%

  • Turn on the cooling water

Good Modeling Practices - Matrix Notation

Matrix notation uses the linear relations among net conversion rates in order to describe the conservation relations inside a system and it is is a well established system to mathematically describe complex models. The description of a system through this methodology is done by defining its number of components n, and its number of processes m. This is the foundation for the stoichiometric matrix, S, the process rate vector, ρ, and finally, the component conversion rate vector r.


_images/equation.png
_images/stoichiometric_table.png

The overall component conversion rate vector (r) is then coupled to a general mass balance equation.

Template example

from scipy.integrate import odeint 
import math 
import numpy as np

class name_of_your_model:
    def __init__(self):
        #Fill up with your parameters
        #Example: self.Y_XS = 0.8 #g/g

        #Initialization of state variable
        #Example: self.S0 = 18

    def rxn(self, C, t):
        #number of components (n = )
        #number of processes (m = )

        #initialize the stoichiometric matrix (e.g. s)
        #s = np.zeros((m,n))
        
        #We define each value of the matrix
        #Example: self.s[0,0]=(-1/self.Y_XS)

        #initialize the process vector (e.g. rho)
        #rho = np.zeros((m,1))

        #We define each value of the process rate vector
        ##Example: rho[0,0] = self.mu_max*(C[0]/(C[0]+self.Ks))*C[2]

        #the overall conversion rate is stoichiometric *rates
        ##Example: self.r[0,0]= self.s[0,0]*self.rho[0,0]+self.s[1,0]*self.rho[1,0]+(s[2,0]*rho[2,0])

        #Then, it is solved the mass balances, in this case discontinuous:
        ##Example: dSdt = self.r[0,0]
        return [dSdt,...]
              
    def solve(self):        
        #Create a vector that shows the initial time to solve the model
        #t = np.linspace(t0, tf)
        
        #Vector with the initial conditions of the state variables
        #C0 = [self.S0,.....]

        #The ODE solver
        #C = odeint(self.rxn, C0, t, rtol = 1e-7, mxstep= 500000)
        return t, C

#How to call the model so it is printed. 
t, C = Model().solve()

Example

A (bio)kinetic first-principle model. In this case, for the representation of the aerobic growth of textit{Corynebacterium glutamicum} under product inhibition.

# -*- coding: utf-8 -*-
"""
Created on Wed Jan 23 10:42:13 2019

@author: simoca
"""

from scipy.integrate import odeint 
#Package for plotting 
import math 
#Package for the use of vectors and matrix 
import numpy as np
import pandas as pd
import array as arr
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
from matplotlib.figure import Figure
import sys
import os
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import glob
from random import sample
import random
import time
import plotly
import plotly.graph_objs as go
import json
from plotly.subplots import make_subplots



class CGlutamicum_aerobic:
    def __init__(self, Control = False):
        self.mu_max= 0.21 #h^-1
        self.Kg=0.8 #g/L
        self.nu=1
        self.Yxs=0.149#g/g
        self.Ypx=3.278 #g/g
        self.Yps=0.48 #g/g
        self.Pcritical=11.529#g/L
        self.kd=0.1


        #Initial concentrations
        self.X0= 0.164 #g/L
        self.P0=0.2#g/L
        self.S0=49.87 #g glucose/L
        self.V0= 2 #L
        
        self.t_end=30
        self.t_start=0
        
        #parameters for control, default every 1/24 hours:
        self.Control = Control
        self.coolingOn = True
        self.Contamination=False
        self.steps = (self.t_end - self.t_start)*24
        self.T0 = 30
        self.K_p = 2.31e+01
        self.K_i = 3.03e-01
        self.K_d = -3.58e-03
        self.Tset = 30
        self.u_max = 150
        self.u_min = 0

    def rxn(self, C,t , u, fc):
        if self.Control == True :
            #Cardinal temperature model with inflection: Salvado et al 2011 "Temperature Adaptation Markedly Determines Evolution within the Genus Saccharomyces"
            #Strain S. cerevisiae PE35 M
            Topt = 30
            Tmax = 45.48
            Tmin = 5.04 
            T = C[5]
            if T < Tmin or T > Tmax:
                 k = 0
            else:
                 D = (T-Tmax)*(T-Tmin)**2
                 E = (Topt-Tmin)*((Topt-Tmin)*(T-Topt)-(Topt-Tmax)*(Topt+Tmin-2*T))
                 k = D/E 
                      
        #number of components
        n=3
        #number of processes
        m=1
        #stoichiometric vector
        s = np.zeros((m, n))
        s[0,0]=-1/self.Yxs
        s[0,1]=self.Ypx
        s[0,2]=1
        ##initialize the overall conversion vector
        r=np.zeros((1,1))
        
        r[0,0]=((self.mu_max*C[0])/(C[0]+self.Kg))*((1-(C[1]/self.Pcritical))**self.nu)

        #Solving the mass balances
        dSdt = s[0,0]*r[0,0]*fc*C[2]
        dPAdt = s[0,1]*r[0,0]*C[2]
        dXdt = s[0,2]*r[0,0]*fc*C[2]
        dVdt = 0
        
        if self.Control == True :
            '''
             dHrxn heat produced by cells estimated by yeast heat combustion coeficcient dhc0 = -21.2 kJ/g
             dHrxn = dGdt*V*dhc0(G)-dEdt*V*dhc0(E)-dXdt*V*dhc0(X)
             (when cooling is working)  Q = - dHrxn -W ,
             dT = V[L] * 1000 g/L / 4.1868 [J/gK]*dE [kJ]*1000 J/KJ
             dhc0(EtOH) = -1366.8 kJ/gmol/46 g/gmol [KJ/g]
             dhc0(Glc) = -2805 kJ/gmol/180g/gmol [KJ/g]
             
            ''' 
            #Metabolic heat: [W]=[J/s], dhc0 from book "Bioprocess Engineering Principles" (Pauline M. Doran) : Appendix Table C.8 
            dHrxndt =   dXdt*C[0]*(-21200) #[J/s]  + dGdt*C[4]*(15580)- dEdt*C[4]*(29710) 
            #Shaft work 1 W/L1
            W = 1*C[0] #[J/S] negative because exothermic  
            #Cooling just an initial value (constant cooling to see what happens)
            #dQdt = -0.03*C[4]*(-21200) #[J/S]   
            #velocity of cooling water: u [m3/h] -->controlled by PID       
            
            #Mass flow cooling water
            M=u/3600*1000 #[kg/s]
            #Define Tin = 5 C, Tout=TReactor
            #heat capacity water = 4190 J/kgK
            Tin = 5
            #Estimate water at outlet same as Temp in reactor
            Tout = C[7]
            cpc = 4190
            #Calculate Q from Eq 9.47
            Q=-M*cpc*(Tout-Tin) # J/s    
            #Calculate Temperature change
            dTdt = -1*(dHrxndt - Q + W)/(C[4]*1000*4.1868) #[K/s]
        else: 
            dTdt = 0
        return [dXdt,dPAdt,dSdt,dVdt, dTdt]
                
    def solve(self):
        #solve normal:
        t = np.linspace(self.t_start, self.t_end, self.steps)
        if self.Control == False :
            u = 0
            fc= 1
            C0 = [self.X0, self.P0, self.S0, self.V0, self.T0]
            C = odeint(self.rxn, C0, t, rtol = 1e-7, mxstep= 500000, args=(u,fc,))
    
        #solve for Control
        else:
            fc=0
            """
            PID Temperature Control:
            """
            # storage for recording values
            C = np.ones([len(t), 5])
            C0 = [self.X0, self.P0, self.S0, self.V0, self.T0]
            self.ctrl_output = np.zeros(len(t)) # controller output
            e = np.zeros(len(t)) # error
            ie = np.zeros(len(t)) # integral of the error
            dpv = np.zeros(len(t)) # derivative of the pv
            P = np.zeros(len(t)) # proportional
            I = np.zeros(len(t)) # integral
            D = np.zeros(len(t)) # derivative
            
            for i in range(len(t)-1):
                #print(t[i])
                #PID control of cooling water
                dt = t[i+1]-t[i]
                #Error
                e[i] = C[i,5] - self.Tset  
                #print(e[i])
                if i >= 1:
                    dpv[i] = (C[i,5]-C[i-1,5])/dt
                    ie[i] = ie[i-1] + e[i]*dt
                P[i]=self.K_p*e[i]
                I[i]=self.K_i*ie[i]
                D[i]=self.K_d*dpv[i]
                
                self.ctrl_output[i]=P[i]+I[i]+D[i]
                u=self.ctrl_output[i]
                if u>self.u_max:
                    u=self.u_max
                    ie[i] = ie[i] - e[i]*dt # anti-reset windup
                if u < self.u_min:
                    u =self.u_min
                    ie[i] = ie[i] - e[i]*dt # anti-reset windup
                #time for solving ODE    
                ts = [t[i],t[i+1]]
                #disturbance
                #if self.t[i] > 5 and self.t[i] < 10:
                #   u = 0                
                #solve ODE from last timepoint to new timepoint with old values              
    
                y =  odeint(self.rxn, C0, ts, rtol = 1e-7, mxstep= 500000, args=(u,fc,))
                #update C0
                C0 = y[-1]
                #merge y to C
                C[i+1]=y[-1]
        return t, C
t, C = CGlutamicum_aerobic().solve()
fig, ax = plt.subplots()
ax.plot(t, C[:,0], label = "Biomass")
ax.plot(t, C[:,1], label = "L-glutamic acid")
ax.plot(t, C[:,2], label = "Glucose")
ax.plot(t, C[:,3], label = "Volume")
ax.plot(t, C[:,4], label = "Temperature")
legend = ax.legend(loc='upper right', shadow=False, fontsize='medium')
plt.xlabel('time (h)')
plt.ylabel('Concentration (g/L)')
plt.show()

MongoDB - Non-relational databases

Definition

Non-relational databases have been established as the favorite option for the Internet of Things paradigm based on their flexibility over their data structure and a faster data retrieve when compared with relational databases. Therefore, non-relational database presents qualities more compatibles with the expected needs of the smart bio-manufacturing, and it is then the data storage strategy selected for our database.

MongoDB is currently the most popular example of non-relational database program. Some examples of software using MongoDB are SEGA, BARCLAYS, or the tax platform for the UK government, etc. MongoDB organizes its data in documents that subsequently are organized in collections, usually with common fields between documents. Those documents characterize by be available to store their data in a JSON data format.

How to…

It is a simple script using Flask in order to create a REST API with a mongo database to get and post data.


from flask import Flask, render_template, request, Response
from flask_pymongo import PyMongo
import json
from werkzeug.security import generate_password_hash, check_password_hash #Cifrar y descifrar contraseñas de usuario
from bson import json_util
from bson.objectid import ObjectId

app= Flask(__name__)
app.config['MONGO_URI'] = 'mongodb://localhost:27017/minerva'

mongo = PyMongo(app)

@app.route('/models', methods=['POST'])
def create_data():
    model_name = request.json['model_name']
    model = request.json['model']

    if model_name and model:
        id = mongo.db.models.insert(
            {'model_name': model_name, 'model': model}
        )
        response = {
            'id' : str(id),
            'model_name' : model_name,
            'model' : model
        }
        return response
    else:
        return message

@app.route('/models', methods=['GET'])
def get_models():
    models = mongo.db.models.find()
    response = json_util.dumps(models)
    return Response(response, mimetype="application/json")

@app.route('/models/<id>', methods = ['GET'])
def get_model(id):
    model = mongo.db.models.find_one({'_id': ObjectId(id)})
    response = json_util.dumps(model)
    return Response(response, mimetype="application/json")


@app.errorhandler(404)
def not_found(error=None):
    message = {
        'message': 'Resource Not Found ' + request.url,
        'status': 404
    }
    return message



if __name__ == "__main__":
    app.run(debug=True)

JSON - Data format

Definition

JSON is a data format with an easily understandable architecture, which cares about their readability and allows basic examination for non-programmers through a reasonable structure that is self-evident. Therefore, it is a perfect data format to store data in a flexible, reusable, and expandable database. Furthermore, it is aligned with the FAIR principles.

FAIR principles work towards the improvement of the findability, accessibility, interoperability, and reuse of digital assets at a global level.

How to…

Example of a json script and how it stores data. This example corresponds to a json script to represent a bioprocess model. The model can be found on https://github.com/simonetacannodelas/BioVL-Library/blob/master/Rx_Fermentation_Scerevisiae_Glucose_Aerobic_Batch.py

model_1=[ {
        "Package": "Rx_Fermentation_Scerevisiae_Glucose_Aerobic_Batch",
        "Process conditions": [
                {"Configuration": "Stirrer tank",
                 "Mode of operation": "Batch",
                 "Oxygen conditions": "Aerobic",
                 "Temperature (ºC)": 30,
                 "pH": 4,
                 "Product":["None","Extracellular"],
                 "Limiting substrate": "Glucose",
                 "Processes involved":
                     ["Oxidation of glucose",
                      "Reduction of glucose",
                      "Oxidation of ethanol",
                      "Oxygen supply"],
                 "Type of microorganism": "Yeast",
                 "Filamentous growth": "No",
                }
                    ],
        "Type of model": "Monod-based model with metabolic switch controlled by the Cabtree Effect/Overflow",
        "Literature reference":"Sonnleitner, B., & Käppeli, O. (1986). Growth of Saccharomyces cerevisiae is controlled by its limited respiratory capacity: formulation and verification of a hypothesis. Biotechnology and bioengineering, 28(6), 927-937.",
    }
]

Python

Brief introduction to the theory



Example

# -*- coding: utf-8 -*-
"""
Created on Wed Jan 23 10:42:13 2019

@author: simoca
"""

from scipy.integrate import odeint 
#Package for plotting 
import math 
#Package for the use of vectors and matrix 
import numpy as np
import pandas as pd
import array as arr
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
from matplotlib.figure import Figure
import sys
import os
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import glob
from random import sample
import random
import time
import plotly
import plotly.graph_objs as go
import json
from plotly.subplots import make_subplots



class CGlutamicum_aerobic:
    def __init__(self, Control = False):
        self.mu_max= 0.21 #h^-1
        self.Kg=0.8 #g/L
        self.nu=1
        self.Yxs=0.149#g/g
        self.Ypx=3.278 #g/g
        self.Yps=0.48 #g/g
        self.Pcritical=11.529#g/L
        self.kd=0.1


        #Initial concentrations
        self.X0= 0.164 #g/L
        self.P0=0.2#g/L
        self.S0=49.87 #g glucose/L
        self.V0= 2 #L
        
        self.t_end=30
        self.t_start=0
        
        #parameters for control, default every 1/24 hours:
        self.Control = Control
        self.coolingOn = True
        self.Contamination=False
        self.steps = (self.t_end - self.t_start)*24
        self.T0 = 30
        self.K_p = 2.31e+01
        self.K_i = 3.03e-01
        self.K_d = -3.58e-03
        self.Tset = 30
        self.u_max = 150
        self.u_min = 0

    def rxn(self, C,t , u, fc):
        if self.Control == True :
            #Cardinal temperature model with inflection: Salvado et al 2011 "Temperature Adaptation Markedly Determines Evolution within the Genus Saccharomyces"
            #Strain S. cerevisiae PE35 M
            Topt = 30
            Tmax = 45.48
            Tmin = 5.04 
            T = C[5]
            if T < Tmin or T > Tmax:
                 k = 0
            else:
                 D = (T-Tmax)*(T-Tmin)**2
                 E = (Topt-Tmin)*((Topt-Tmin)*(T-Topt)-(Topt-Tmax)*(Topt+Tmin-2*T))
                 k = D/E 
                      
        #number of components
        n=3
        #number of processes
        m=1
        #stoichiometric vector
        s = np.zeros((m, n))
        s[0,0]=-1/self.Yxs
        s[0,1]=self.Ypx
        s[0,2]=1
        ##initialize the overall conversion vector
        r=np.zeros((1,1))
        
        r[0,0]=((self.mu_max*C[0])/(C[0]+self.Kg))*((1-(C[1]/self.Pcritical))**self.nu)

        #Solving the mass balances
        dSdt = s[0,0]*r[0,0]*fc*C[2]
        dPAdt = s[0,1]*r[0,0]*C[2]
        dXdt = s[0,2]*r[0,0]*fc*C[2]
        dVdt = 0
        
        if self.Control == True :
            '''
             dHrxn heat produced by cells estimated by yeast heat combustion coeficcient dhc0 = -21.2 kJ/g
             dHrxn = dGdt*V*dhc0(G)-dEdt*V*dhc0(E)-dXdt*V*dhc0(X)
             (when cooling is working)  Q = - dHrxn -W ,
             dT = V[L] * 1000 g/L / 4.1868 [J/gK]*dE [kJ]*1000 J/KJ
             dhc0(EtOH) = -1366.8 kJ/gmol/46 g/gmol [KJ/g]
             dhc0(Glc) = -2805 kJ/gmol/180g/gmol [KJ/g]
             
            ''' 
            #Metabolic heat: [W]=[J/s], dhc0 from book "Bioprocess Engineering Principles" (Pauline M. Doran) : Appendix Table C.8 
            dHrxndt =   dXdt*C[0]*(-21200) #[J/s]  + dGdt*C[4]*(15580)- dEdt*C[4]*(29710) 
            #Shaft work 1 W/L1
            W = 1*C[0] #[J/S] negative because exothermic  
            #Cooling just an initial value (constant cooling to see what happens)
            #dQdt = -0.03*C[4]*(-21200) #[J/S]   
            #velocity of cooling water: u [m3/h] -->controlled by PID       
            
            #Mass flow cooling water
            M=u/3600*1000 #[kg/s]
            #Define Tin = 5 C, Tout=TReactor
            #heat capacity water = 4190 J/kgK
            Tin = 5
            #Estimate water at outlet same as Temp in reactor
            Tout = C[7]
            cpc = 4190
            #Calculate Q from Eq 9.47
            Q=-M*cpc*(Tout-Tin) # J/s    
            #Calculate Temperature change
            dTdt = -1*(dHrxndt - Q + W)/(C[4]*1000*4.1868) #[K/s]
        else: 
            dTdt = 0
        return [dXdt,dPAdt,dSdt,dVdt, dTdt]
                
    def solve(self):
        #solve normal:
        t = np.linspace(self.t_start, self.t_end, self.steps)
        if self.Control == False :
            u = 0
            fc= 1
            C0 = [self.X0, self.P0, self.S0, self.V0, self.T0]
            C = odeint(self.rxn, C0, t, rtol = 1e-7, mxstep= 500000, args=(u,fc,))
    
        #solve for Control
        else:
            fc=0
            """
            PID Temperature Control:
            """
            # storage for recording values
            C = np.ones([len(t), 5])
            C0 = [self.X0, self.P0, self.S0, self.V0, self.T0]
            self.ctrl_output = np.zeros(len(t)) # controller output
            e = np.zeros(len(t)) # error
            ie = np.zeros(len(t)) # integral of the error
            dpv = np.zeros(len(t)) # derivative of the pv
            P = np.zeros(len(t)) # proportional
            I = np.zeros(len(t)) # integral
            D = np.zeros(len(t)) # derivative
            
            for i in range(len(t)-1):
                #print(t[i])
                #PID control of cooling water
                dt = t[i+1]-t[i]
                #Error
                e[i] = C[i,5] - self.Tset  
                #print(e[i])
                if i >= 1:
                    dpv[i] = (C[i,5]-C[i-1,5])/dt
                    ie[i] = ie[i-1] + e[i]*dt
                P[i]=self.K_p*e[i]
                I[i]=self.K_i*ie[i]
                D[i]=self.K_d*dpv[i]
                
                self.ctrl_output[i]=P[i]+I[i]+D[i]
                u=self.ctrl_output[i]
                if u>self.u_max:
                    u=self.u_max
                    ie[i] = ie[i] - e[i]*dt # anti-reset windup
                if u < self.u_min:
                    u =self.u_min
                    ie[i] = ie[i] - e[i]*dt # anti-reset windup
                #time for solving ODE    
                ts = [t[i],t[i+1]]
                #disturbance
                #if self.t[i] > 5 and self.t[i] < 10:
                #   u = 0                
                #solve ODE from last timepoint to new timepoint with old values              
    
                y =  odeint(self.rxn, C0, ts, rtol = 1e-7, mxstep= 500000, args=(u,fc,))
                #update C0
                C0 = y[-1]
                #merge y to C
                C[i+1]=y[-1]
        return t, C
t, C = CGlutamicum_aerobic().solve()
fig, ax = plt.subplots()
ax.plot(t, C[:,0], label = "Biomass")
ax.plot(t, C[:,1], label = "L-glutamic acid")
ax.plot(t, C[:,2], label = "Glucose")
ax.plot(t, C[:,3], label = "Volume")
ax.plot(t, C[:,4], label = "Temperature")
legend = ax.legend(loc='upper right', shadow=False, fontsize='medium')
plt.xlabel('time (h)')
plt.ylabel('Concentration (g/L)')
plt.show()

A step further

If you want to improve your pure Python programming abilities in a relaxing and challenging way, I recommend you Python Challenge. This webpage has a series of an increasingly difficult challenges to solve thinking as a coder on Python. GOOD LUCK!

Software integration

Se mueren de envidia


_images/Architecture.png

Artificial Intelligence

Matrix notation uses the linear relations among net conversion rates in order to describe the conservation relations inside a system and it is is a well established system to mathematically describe complex models. The description of a system through this methodology is done by defining its number of components n, and its number of processes m. This is the foundation for the stoichiometric matrix, S, the process rate vector, ρ, and finally, the component conversion rate vector r.


_images/equation.png
_images/stoichiometric_table.png

The overall component conversion rate vector (r) is then coupled to a general mass balance equation.

Indices and tables