ResAnalysis

class ResAnalysis(prodnet, solution_input, prod_to_remove_id)

This class features multiple functions that streamline the analysis of one or more solutions (all designs in a given set of parameters) and designs (a specific set of deletions with a solution from the mop).

ResAnalysis(prodnet, solution_input, prod_to_remove_id)

Args: Solution input( cell/string or mop_solution structure). Indicates solutions to be analyzed. Contains solution id(s) in the

current prodnet problem path OR mop_solution structure.

prod_id_to_remove(cell). The id of products to be omited from the analysis.

consistent_solutions = None

(structure of mop solutions) All mop_solutions have the same number of networks, in case some networks are missing the design variables and objectives will be padded with zeros.

default_figure_info = None

(structure) Note: Currently unused..colors: vector of 5-10 colors with strong contrast, .line_type

default_plot_lines = None

(structure) Plot formatting. Note: currently unused.

is_wt_points_calc = None

(logical) Bookeeping.

n_solutions = None

(integer) Number of loaded solutions.

prodnet = None

(Prodnet class) Note this is a copy of prodnet, not a reference.

solution_ids = None

(cell of strings) ids of loaded solutions, order matches the solutions structure.

solutions = None

(structure of mop solutions) Solutions for analysis.

wt_all_growth_rates = None

(matrix) Points for all wild type production network production envelopes.

wt_all_product_yields = None

(matrix) Points for all wild type production network production envelopes.

calc_flux_table(obj, design_ind, varargin)

Create a table of flux distributions for wt and all mutant production networks. Two states are possible growth and non-growth. Also, two flux estimation methods are possible, pFBA and FVA, in both cases the fluxes of product and biomass are fixed to the optimal state.

Parameters:
  • design_ind (integer) – Index of the specific design to be applied to construct mutant.
  • plot_top_n (integer, optional) – top fluxes to be ploted in heatmap figure
  • ng_state (logical, optional) – False (default) calcualte growth state fluxes (maximum growth rate is impossed as constraint). True: calculate non-growth state fluxes (growth rate isconstrained to 0)
  • sol_ind (integer,optioinal) – Index of the solution to which the design_ind belongs. Defaults to 1.
  • add_FVA (logical, optional) – If true adds FVA results to the pFBA table. Default is false.
  • calc_cofactor_turnover (logical, optional) – If true, creates a new table with cofactor turnovers from pFBA flux distributions. Default is false.
  • min_obj (double, optional) – Minimum objective value required to include production networks in the final table. Default is 0.
  • write_output (logical, optional) – If true, writes an xls table (two if cofactor turnover option is enabled). Default is true. to problem output folder.
  • only_mutant_flux (logical, optional) – If true the table only contains reactions for mutant flux (no wild type, and no difference information). Default is false.
  • sort_by_diff (logical, optional) – If true, sorts the table by the average difference with respect to the wild type flux. default true.
  • solver (string, optional) – The solver to use, default is ‘cplex’, if not ‘gurobi’ will be used, and if neither is an option ‘matlab’ will be used.
Returns:

  • flux.headers (cell) – headers for flux table
  • flux.data (double) – data for flux table
  • ct.headers (cell) – headers for cofactor turnover table
  • ct.data (double) – data for covator turnover table

Notes

  • Gurobi is the preferred QP solver for pFBA. Parameters have been tweaked to ensure convergence. While the Matlab QP solver (quadprog) is supproted, it may not converge.
calc_prod_envelope(obj, model_ind, npoints)

Finds points for 2d projection of the metabolic model bu solving a series of LPs.

Parameters:
  • model_ind (int) – index of the target production network.
  • npoints (int) – Number of points to sample.
Returns:

  • growth_rates (vector)
  • product_yields (vector)

Notes

  • This function is a wrapper for the cobratoolbox function productionEnvelope().
calc_prod_envelope_s(model, npoints)
Finds points for 2d projection of the metabolic model by solving a series of LPs.
This is a static verion of calc_prod_env
Parameters:
  • model – A cobra model with modcell fields.
  • npoints (int) – Number of points to sample.
Returns:

  • growth_rates (vector)
  • product_rates (vector)
  • product_yields (vector)

Notes

  • This function is a wrapper for the cobratoolbox function productionEnvelope().
compatibility(obj, varargin)

Analyze compatibility of the solutions in obj.solutions

Args
cutoff(double): Compatibility threshoold, default is 0.6;
Returns:
  • comp(i).vals (vector) – compatibility of all solutions
  • comp(i).max (double) – maximum compatibility
  • comp(i).max_inds (vector) – indices of the most compatible solutions
create_consistent_solutions(obj)

Create consistent solutions, i.e. all mop_solutions index to the same production networks. The first solution is used as a reference, and missing production networks are included in other solutions.

Notes

  • This is relevant for analysis functions like plot_yield_vs_growth and
    plot_design_tradeoff.
escher_input_from_pFBA_table(obj, pfba_t_filename, column_id)

Keeps two columns for pFBA table and writes a csv which may be used for analysis with escher.

Parameters:
  • pfba_t_filename (string) – name of the file resulting from src.@ResAnalysis.calc_flux_table()
  • column_id (string) – id indicating the production network to be kept in the output.
fcm_get_table(obj, n_clusters, sol_ind)

Performs fuzzy c-means clustering and returns the output in a table form

Parameters:
  • n_clusters (integer) – Number of clusters for k-medoids.
  • solution_ind (integer, optional) – Index of the solution to be ploted, defaults to 1.
Returns:

  • table_id (cell of strings) – contains cluster id (headers)
  • table_val (doubles) – table of membership values

fcm_scan_c(obj, sol_ind)

Plot the objective value of fuzzy c-means results versus the number of clusters

Parameters:solution_ind (integer, optional) – Index of the solution to be ploted, defaults to 1.
get_CPF(PF, cutoff)

Computes categorical pareto front.

graph_deletion_frequency(obj, solution_ind)

Defined by a square matrix of reaction deletion frequencies, this graph allows to identify clusters of reactions that appear deleted often. Aij = P(reaction j in design | reaction i in design).

Notes

  • The graph has to be directed, because the probability of needing one deletion depends on which one has been done first.
  • A simpler graph could use the overall probability (based on the numeber of desings)
graph_sequential_implementation(obj)

Wrapper for graph_sequential_implementation_s()

graph_sequential_implementation_s(mop_solutions, solution_id, prodnet, prod_id, varargin)

Sequential implementation directed k-partite graph. Each partition corresponds to a set of parameters (e.g. wGCP-5-0) and each node to a specific design. E.g. wGCP-5-0-1 points to wGCP-6-0-1 if is the deletions in the former are contained in the later.

Notes

  • The output files are meant to be analyzed with Cytoscape.
  • The implementation uses sets of deletion IDs instead of faster logical indices , to allow for the case where indices are not consistent (e.g. wGCP vs NGP).
Parameters:
  • mop_solutions (cell array of mop_solution) –
  • solution_id (cell) –
  • prodnet (Prodnet object instance) –
  • prod_id (cell) –
  • write_nstep_graph (logical, optional) – Default false.
  • base_path (string, optioinal) – Default = [inputs.prodnet.problem_path, filesep, ‘output’, filesep]
Returns:

  • sequential_implem_graph_edge.csv (csv file) – headers correspond to: source | target | additional_rxns |is_same.
  • sequential_implem_graph_edge_nstep.csv (csv file) – The same as sequential_implem_graph_edge.csv but the graph is complete in the sense that a node from one parameter set can have edges to any other downstream parameter set. ( In sequential_implem_graph_edge.csv nodes from one parameter set can only be connected with the next parameter set)
  • sequential_implem_graph_node.csv (csv file) – A node attribute table with: node_id | short_name | design_param(i.e. partition)
  • Warning
    • This method assumes that mop_solutions are ordered in terms of
      increasing deletions.

identify_deletion_role(model, base_deletion_set, compare_deletion_set, varargin)

Identifies the role of one or more reaction deletions in the model flux distribution.

Parameters:
  • model (cobra model) –
  • base_deleion_set (cell array of reaction ids) – Use as a reference.
  • compare_deletion-set (cell array of reaction ids) – Usually contains one deletion less than the base_deletion_set, to idenify the role of such reaciton.
  • growth_state (string, optional) –

    -‘all’ (default), growth state is not constrained. -‘max’ growth rate is fixed to the max attainable by the model with

    the base_deleion_set applied.

    -‘none’ growth rate is fixed to zero.

  • difference_type (string, optional) –
    • fva_range_l1 (default), computes fva for both models, the flux
      range, and then sorts by the absolute difference of ranges.
    • fva_range_jaccard, omputes fva for both models, the flux
      range, and then sorts by the jacard distancde of ranges.
    • sample_distance, perform sampling in both models, then sort by the
      distance specified by the parameter ‘distribution_metric’
    • pfba, compares pfba solution when maximizing growth rate for both models, and sorts by L1 norm.
  • distribution_metric (string, optional) –

    If using the ‘sampling_distance’ difference_type, determines what metric to use when

    comparing distributions.
    -‘kolmogorov-smirnov-p’(default) -

Warning

  • Only fva methods have been tested
load_solutions(obj, solution_ids)

loads mop_solutions from problem output folders.

Parameters:solution_ids (cell or string) – ids of mop_solution files.
piechart_deletion_frequency(deletions, categories_in, varargin)

Pie chart of deletion distribution with category information.

Args
deletions(index).id (cell array): deletion ids categories(containers.Map()): maps ids to their respective category (e.g. subsystem). top(double, optional) figure_handle(object, optional):

Notes

  • While containers.Map() seemed like a useful data structure, it does
    not integrate well with the rest of matlab.
piechart_deletion_frequency_w(obj, sol_inds, varargin)

Wrapper for piechart_deletion_frequency()

plot_2d_pf(obj, n_clusters, solution_ind)

A 2 dimensional representation of the pareto front, representative solutions are determined through k-medoids clustering.

Parameters:
  • n_clusters (integer) – Number of clusters for k-medoids.
  • solution_ind (integer, optional) – Index of the solution to be ploted, defaults to 1.
plot_compatibility(obj, varargin)

Box plot of compatibility distributions accross paramters sets. Compatibility of a solution is the number of products with design objective above a certain threshold.

Parameters:
  • categorical_cutoff_wGCP_NGP (-) – build the categorical pareto front of solutions using the wGCP and NGP objectives, the default is 0.6.
  • categorical_cutoff_sGCP (-) – build the categorical pareto front of solutions using the sGCP objective, the default is 0.36.
  • plot_type (-) – ‘default’ (default), ‘no-ndesigns’,’inplot-ndesigns’.
  • y_n_loc (-) – If the option ‘inplot-ndesigns’ is used, this controls the y position. Default is 0.4.
plot_design_tradeoff(obj, design_ind, varargin)

Plots the objective value of the selected design(s), specified by design_ind, with respect to the maximum objective value for each objective.

Usage:
For two wGCP solutions, wGCP-10-0 and wGCP-10-3 loaded in obj, in that order, to plot wGCP-10-0-5 and wGCP-10-3-10 run: obj.plot_design_tradeoff([5,10],[1,2]).
Parameters:
  • design_ind (vector) – Indices of the solutions of designs to be plotted, each design corresponds to a solution specified in inputs.solution_ind.
  • solution_ind (integer, optional) – Defaults to 1:length(design_ind).
  • sort_solution (logical, optional) – Defaults to true, the objectives are sorted to improve readibility. If false, the order of the ra.consitent_solution is used
  • plot_type (string, optional) – ‘overlap’(default), all solutions in one plot. ‘split’(one plot per solution); ‘split-bar’, one plot per solution using bar plot for maximum objective.
plot_fva_range(fva_results, varargin)

Creates a visual plot of fva ranges comparing two models

Args
  • fva_results.rxns (cell array): reaction ids
  • fva_results.maxflux_base (vector): vector of maximum fluxes for base model
  • fva_results.minflux_base (vector)
  • fva_results.maxflux_compare (vector)
  • fva_results.minflux_compare (vector)
  • sort_ind (vector, optional): A sorting vector for reactions. Default
    is sort by range.
  • top(integer, optional): plots the top reactions from sort_ind,
    default is 20 first.
Credits
  • Adapted from cobra toolbox tutorial
plot_pareto_front(obj, varargin)

Plots pareto front clustergram and related figures.

Parameters:
  • solution_ind_or_id (integer or string, optional) –
  • plot_cpf (logical, optional) –
  • save_cpf (string, optional) – ‘no’ (default), ‘heatmap’(saves a tables of 0s and 1s which can be used for a heatmap), ‘names’ (a real table version which lists the networks in each cpf).
  • cpf_cutoffs (vector, optional) – A vector of objective values used to generate a categorical pareto front. Note that a scalar can also be provided. Default is 0.6.
  • plot_pareto_set (logical, optional) – If true, the pareto set is plotted as a clustergram. Default false. plot_hetmap (logical, optional): If true, heatmaps plots are used instead of clustergrams. Defaults to false. save_to_emf (logical, optional): If true, saves the output to an enhanced metafile. Defaults to true. figure_size (vector of figure size, optional): Used in Matlab to specify figure size and location. By default Matlab will determine this.

Notes

  • This instruction can be used to supress figure display: set(0,’DefaultFigureVisible’,’off’)
plot_pca()

WIP

plot_yield_vs_growth(obj, design_inds, varargin)

Generates a multiple plot of production envelopes (aka convex hull). Many aspects of the plot can be customized.

Usage:
Note that optional parameters must be entered as a string-value pair, e.g. obj.plot_yield_vs_growth([5,1],’min_obj_val’, 0.1).
Parameters:
  • design_inds (vector) – The length matches the numbers of solutions loaded in obj.solutions. Each entry corresponds to the index of a design to be plotted for the solution in the same possition. E.g. if wGCP-5-0 and wGCP-6-0 are loaded, desing_inds = [5,1] plots wGCP-5-0-5 and wGCP-6-0-1.
  • plot_type (string, optional) – Two options, ‘matrix’(default) where each row is a design, or ‘overlap’ where designs share the same phenotypic space.
  • min_obj_val (double, optional) – Products below this objective value are not plotted. Default is 0.
  • yticks_values (vector, opional) – Default is choosen by Matlab.
  • n_rows (string, optional) – For overlap plot. # rows
  • n_cols (string, optional) – For overlap plot. # columns
  • use_prod_name (logical, optional) – If true, product names are used for plot titles. By default(false) product ids are used.
  • npoints (integer, optional) – Points used to sample the convex hull
  • convHullLineWidthWT (double, optional) –
  • convHullLineWidthMut (double, optional) –
  • fill_space (logical,optional) – Defautl true, color the space inside the convex hull.
  • wt_color (rgb triplet scaled from 0-1, optional) – e.g. [0,255,0]./255
  • wt_line_color (rgb triplet scaled from 0-1, optional) –
  • mut_color (rgb triplet scaled from 0-1, optional) –
  • mut_line_color (rgb triplet scaled from 0-1, optional) –
  • set_axis_front (logical, optional) – Default true, will put axis on top drawn figure content
  • TODO
    • Document additional options.
    • Consolidate with static method to avoid code duplication.
plot_yield_vs_growth_s(model_array, ko_array, prod_id, solution_id, varargin)
Generates a multiple plot of production envelopes (aka convex hull).
Many aspects of the plot can be customized.
Parameters:
  • model_array (structure array) – A structure of cobra models.
  • ko_array (structure array) –

    The indices of this sturcture match those of the models.The only field is ko_array(i).designs(j).del, which is a cell array containing either reaction

    deletions or gene deletions. i is corresponds to the model index, and j to the design. All models must have the same number of designs. If the deletion_type is ‘other’, ko_array(i).designs(j).ub and ko_array(i).designs(j).lb must be provided.
    prod_id : cell of strings
    Names of the models in model_array.
    solution_id : cell of strings
    Names of the solutions in model index.
  • deletion_type (string, optional) –
    Type of deletion. Default is ‘reaction’. Alternatives are ‘gene’ for gene deletions,
    and ‘other’, which will enforce ko_array(i).designs(j).ub and .lb.
  • use_rates (logical, optional) –
    If true The product rate will be ploted vs growth rate, if
    if false(default), the product yield will be ploted vs growth rate.
  • plot_type (string, optional) – Two options, ‘matrix’(default) where each row is a design, or ‘overlap’ where designs share the same phenotypic space.
  • yticks_values (vector, opional) – Default is choosen by Matlab.
  • n_rows (string, optional) – For overlap plot. # rows
  • n_cols (string, optional) – For overlap plot. # columns
  • npoints (integer, optional) – Points used to sample the convex hull
  • convHullLineWidthWT (double, optional) –
  • convHullLineWidthMut (double, optional) –
  • fill_space (logical,optional) – Defautl true, color the space inside the convex hull.
  • wt_color (rgb triplet scaled from 0-1, optional) – e.g. [0,255,0]./255
  • wt_line_color (rgb triplet scaled from 0-1, optional) –
  • mut_color (rgb triplet scaled from 0-1, optional) –
  • mut_line_color (rgb triplet scaled from 0-1, optional) –
  • set_axis_front (logical, optional) – Default true, will put axis on top drawn figure content
  • Usage
    • Optional parameters must be entered as a string-value pair, e.g. obj.plot_yield_vs_growth([5,1],’min_obj_val’, 0.1).
    • Module variables must be applied prior to input in the function.

    (i.e. ko_array(i).designs(j).del should contain deletions of the original desing - module variables).

  • Warning
    • The state of the model_array (i.e. deletions or lack thereof) will be
      considered as the wild type state. Thus it is
  • Notes
    • This function is based on plot_yield_vs_growth.m but it is mean to be
      more flexible.
print_design(obj, design_ind, varargin)

Displays deleted reactions for a certain solution. Also indicates if the deletion does not apply to a certain producion network (module reaction)

Parameters:
  • design_ind (int) – Index of the design to be printed.
  • sol_ind (int, optional) – Index of the solution from which to draw design, default is 1.
  • geneid2name (dict_path, optional) – The path to a two column csv file,
  • first columns are gene ids and second column are gene names. (where) – default is ‘’.
  • extra_rxns (cell array of reaction ids) – Additional reactions not in the design which will be included in the table. (useful for alternative solutions)
  • is_alternative (logical, optional) – If true, an alternative solution of design_ind, specified by alternative_ind will be considered. Default is false.
  • alternative_ind (double, optional) – Only relevant for alternative solutions (see is alternative). Index of the alternative solution, default is 1.
  • verbose (logical, optional) – Weather or not to display the design. Default is true.
Returns:

  • T (table) – Design information
  • deleted_reactions (cell array)

Notes

  • Currently only supports reaction deletions.
remove_always_zero_prod(obj, solution_ind)

Removes products (ignore for the analysis) which are always 0 in the feasible solutions corresponding to solution_ind.

Args: solution_ind (int): Index of the solution to be deleted

remove_products(obj, prod_to_remove_id)
Deletes production networks from obj.prodnet, so that they are ignored

for the analysis.

Args:
prod_to_remove_id (cell fo strings)

Prodnet already has a method for this

set_solution_state(obj, sol_ind)

Returns a mop_solution and sets the obj.prodnet to the same state, in order to avoid side effects when analyzing that solution.

Parameters:sol_ind (integer) – Index of the solution to be retrieved.
similarity_plots(obj, type, output_graph_name, corr_cutoff, correl_type, solution_ind_or_id)

Render correlation graphs for pareto front matrix.

Parameters:type (str) – Correlation coefficient ‘pearson’ or ‘spearman’.

Notes

Because matlabs graph rendering features are terrible, the graph can be outputed to a csv for better drawing with high quality free software like cytoscape

stepwise_implementation(obj, design_ind, varargin)

Explores all possible subsets of a solution and generates a report and a tree (cytoscape input) with the most promissing canidates to achieve the target design with useful designs on the way.

Parameters:
  • design_ind (integer) – Index of the design in the solution given by sol_ind to be analyzed.
  • sol_ind (integer, optional) – Solution index, default 1.
  • compatibiliy_cutoff (double, optional) – Value used to determine compatibiliy of a design, defaul is 0.6.
  • max_level (integer, optional) – Number of levels in the implementation tree to explore. The default is up to the number of deleions in the given design minus 1. (e.g. if the given design has 5 deletions, subset designs with 1,2,3, and 4 deletions will be explored).
  • write_tables (logical, optional) – If true (default is false) a table of
  • sorted by compatibility, is written frol each level to (designs,) –
  • + levelX.csv (output_base_path) –
  • write_graph (logical, optional) – If true (default) a output_base_path(string, opional) : For ouput files, default is obj.prodnet.problem_path/output/<design-objective>-<design-deletitons>-<design-ind>
  • min_obj_val (double, optional) – For output graph, products with objectives below min_obj_val in all designs will be removed. Default is 0.1.
  • alt_sol_ind (integer, optional) – Index of an alternative_solution to design_ind. Default is 0 which indicates that no alternative solution is considered.
  • only_nondominated (logical, optional) – If true only non-dominated
  • at each step are kept. Default is false. (solutions) –
Notes
  • Module reactions are those of the final solution, so only one module
    needs to be constructed for all strains.
write_result_tables(obj, varargin)

Generates a report for a given set of solutions

Parameters:
  • file_name (string, optional) – Name of the output file. Defaults to problem-name-report.
  • skip_log – Indicates if the sheet containing a log of the solutions should be skipped. Defaults to false.

Notes

  • Gene deletion report not suported yet
  • All designs have to be growth type (i.e. wGCP,sGCP) or non-growth (NGP).
  • Currently csv output does not include superheaders
write_to_xls(obj, file_name, skip_log)

Generates a report for a given set of solutions

Parameters:
  • file_name (string, optional) – Name of the output file. Defaults to problem-name-report.
  • skip_log (logical, optional) – Indicates if the sheet containing a log of the solutions should be skipped. Defaults to false.

Notes

  • Gene deletion report not suported yet
  • All designs have to be growth type (i.e. wGCP,sGCP) or non-growth (NGP).

Warning

  • Depreciated function, use ResAnalysis.write_result_tables instead.