ResAnalysis¶
-
class
ResAnalysis
(prodnet, solution_input, prod_to_remove_id)¶ This class features multiple functions that streamline the analysis of one or more solutions (all designs in a given set of parameters) and designs (a specific set of deletions with a solution from the mop).
-
ResAnalysis
(prodnet, solution_input, prod_to_remove_id)¶ Args: Solution input( cell/string or mop_solution structure). Indicates solutions to be analyzed. Contains solution id(s) in the
current prodnet problem path OR mop_solution structure.prod_id_to_remove(cell). The id of products to be omited from the analysis.
-
consistent_solutions
= None¶ (structure of mop solutions) All mop_solutions have the same number of networks, in case some networks are missing the design variables and objectives will be padded with zeros.
-
default_figure_info
= None¶ (structure) Note: Currently unused..colors: vector of 5-10 colors with strong contrast, .line_type
-
default_plot_lines
= None¶ (structure) Plot formatting. Note: currently unused.
-
is_wt_points_calc
= None¶ (logical) Bookeeping.
-
n_solutions
= None¶ (integer) Number of loaded solutions.
-
prodnet
= None¶ (Prodnet class) Note this is a copy of prodnet, not a reference.
-
solution_ids
= None¶ (cell of strings) ids of loaded solutions, order matches the solutions structure.
-
solutions
= None¶ (structure of mop solutions) Solutions for analysis.
-
wt_all_growth_rates
= None¶ (matrix) Points for all wild type production network production envelopes.
-
wt_all_product_yields
= None¶ (matrix) Points for all wild type production network production envelopes.
-
-
calc_flux_table
(obj, design_ind, varargin)¶ Create a table of flux distributions for wt and all mutant production networks. Two states are possible growth and non-growth. Also, two flux estimation methods are possible, pFBA and FVA, in both cases the fluxes of product and biomass are fixed to the optimal state.
Parameters: - design_ind (integer) – Index of the specific design to be applied to construct mutant.
- plot_top_n (integer, optional) – top fluxes to be ploted in heatmap figure
- ng_state (logical, optional) – False (default) calcualte growth state fluxes (maximum growth rate is impossed as constraint). True: calculate non-growth state fluxes (growth rate isconstrained to 0)
- sol_ind (integer,optioinal) – Index of the solution to which the design_ind belongs. Defaults to 1.
- add_FVA (logical, optional) – If true adds FVA results to the pFBA table. Default is false.
- calc_cofactor_turnover (logical, optional) – If true, creates a new table with cofactor turnovers from pFBA flux distributions. Default is false.
- min_obj (double, optional) – Minimum objective value required to include production networks in the final table. Default is 0.
- write_output (logical, optional) – If true, writes an xls table (two if cofactor turnover option is enabled). Default is true. to problem output folder.
- only_mutant_flux (logical, optional) – If true the table only contains reactions for mutant flux (no wild type, and no difference information). Default is false.
- sort_by_diff (logical, optional) – If true, sorts the table by the average difference with respect to the wild type flux. default true.
- solver (string, optional) – The solver to use, default is ‘cplex’, if not ‘gurobi’ will be used, and if neither is an option ‘matlab’ will be used.
Returns: - flux.headers (cell) – headers for flux table
- flux.data (double) – data for flux table
- ct.headers (cell) – headers for cofactor turnover table
- ct.data (double) – data for covator turnover table
Notes
- Gurobi is the preferred QP solver for pFBA. Parameters have been tweaked to ensure convergence. While the Matlab QP solver (quadprog) is supproted, it may not converge.
-
calc_prod_envelope
(obj, model_ind, npoints)¶ Finds points for 2d projection of the metabolic model bu solving a series of LPs.
Parameters: - model_ind (int) – index of the target production network.
- npoints (int) – Number of points to sample.
Returns: - growth_rates (vector)
- product_yields (vector)
Notes
- This function is a wrapper for the cobratoolbox function productionEnvelope().
-
calc_prod_envelope_s
(model, npoints)¶ - Finds points for 2d projection of the metabolic model by solving a series of LPs.
- This is a static verion of calc_prod_env
Parameters: - model – A cobra model with modcell fields.
- npoints (int) – Number of points to sample.
Returns: - growth_rates (vector)
- product_rates (vector)
- product_yields (vector)
Notes
- This function is a wrapper for the cobratoolbox function productionEnvelope().
-
compatibility
(obj, varargin)¶ Analyze compatibility of the solutions in obj.solutions
- Args
- cutoff(double): Compatibility threshoold, default is 0.6;
Returns: - comp(i).vals (vector) – compatibility of all solutions
- comp(i).max (double) – maximum compatibility
- comp(i).max_inds (vector) – indices of the most compatible solutions
-
create_consistent_solutions
(obj)¶ Create consistent solutions, i.e. all mop_solutions index to the same production networks. The first solution is used as a reference, and missing production networks are included in other solutions.
Notes
- This is relevant for analysis functions like plot_yield_vs_growth and
- plot_design_tradeoff.
-
escher_input_from_pFBA_table
(obj, pfba_t_filename, column_id)¶ Keeps two columns for pFBA table and writes a csv which may be used for analysis with escher.
Parameters: - pfba_t_filename (string) – name of the file resulting from
src.@ResAnalysis.calc_flux_table()
- column_id (string) – id indicating the production network to be kept in the output.
- pfba_t_filename (string) – name of the file resulting from
-
fcm_get_table
(obj, n_clusters, sol_ind)¶ Performs fuzzy c-means clustering and returns the output in a table form
Parameters: - n_clusters (integer) – Number of clusters for k-medoids.
- solution_ind (integer, optional) – Index of the solution to be ploted, defaults to 1.
Returns: - table_id (cell of strings) – contains cluster id (headers)
- table_val (doubles) – table of membership values
-
fcm_scan_c
(obj, sol_ind)¶ Plot the objective value of fuzzy c-means results versus the number of clusters
Parameters: solution_ind (integer, optional) – Index of the solution to be ploted, defaults to 1.
-
get_CPF
(PF, cutoff)¶ Computes categorical pareto front.
-
graph_deletion_frequency
(obj, solution_ind)¶ Defined by a square matrix of reaction deletion frequencies, this graph allows to identify clusters of reactions that appear deleted often. Aij = P(reaction j in design | reaction i in design).
Notes
- The graph has to be directed, because the probability of needing one deletion depends on which one has been done first.
- A simpler graph could use the overall probability (based on the numeber of desings)
-
graph_sequential_implementation
(obj)¶ Wrapper for
graph_sequential_implementation_s()
-
graph_sequential_implementation_s
(mop_solutions, solution_id, prodnet, prod_id, varargin)¶ Sequential implementation directed k-partite graph. Each partition corresponds to a set of parameters (e.g. wGCP-5-0) and each node to a specific design. E.g. wGCP-5-0-1 points to wGCP-6-0-1 if is the deletions in the former are contained in the later.
Notes
- The output files are meant to be analyzed with Cytoscape.
- The implementation uses sets of deletion IDs instead of faster logical indices , to allow for the case where indices are not consistent (e.g. wGCP vs NGP).
Parameters: - mop_solutions (cell array of mop_solution) –
- solution_id (cell) –
- prodnet (Prodnet object instance) –
- prod_id (cell) –
- write_nstep_graph (logical, optional) – Default false.
- base_path (string, optioinal) – Default = [inputs.prodnet.problem_path, filesep, ‘output’, filesep]
Returns: - sequential_implem_graph_edge.csv (csv file) – headers correspond to: source | target | additional_rxns |is_same.
- sequential_implem_graph_edge_nstep.csv (csv file) – The same as sequential_implem_graph_edge.csv but the graph is complete in the sense that a node from one parameter set can have edges to any other downstream parameter set. ( In sequential_implem_graph_edge.csv nodes from one parameter set can only be connected with the next parameter set)
- sequential_implem_graph_node.csv (csv file) – A node attribute table with: node_id | short_name | design_param(i.e. partition)
- Warning –
- This method assumes that mop_solutions are ordered in terms of
- increasing deletions.
-
identify_deletion_role
(model, base_deletion_set, compare_deletion_set, varargin)¶ Identifies the role of one or more reaction deletions in the model flux distribution.
Parameters: - model (cobra model) –
- base_deleion_set (cell array of reaction ids) – Use as a reference.
- compare_deletion-set (cell array of reaction ids) – Usually contains one deletion less than the base_deletion_set, to idenify the role of such reaciton.
- growth_state (string, optional) –
-‘all’ (default), growth state is not constrained. -‘max’ growth rate is fixed to the max attainable by the model with
the base_deleion_set applied.-‘none’ growth rate is fixed to zero.
- difference_type (string, optional) –
- fva_range_l1 (default), computes fva for both models, the flux
- range, and then sorts by the absolute difference of ranges.
- fva_range_jaccard, omputes fva for both models, the flux
- range, and then sorts by the jacard distancde of ranges.
- sample_distance, perform sampling in both models, then sort by the
- distance specified by the parameter ‘distribution_metric’
- pfba, compares pfba solution when maximizing growth rate for both models, and sorts by L1 norm.
- distribution_metric (string, optional) –
If using the ‘sampling_distance’ difference_type, determines what metric to use when
- comparing distributions.
- -‘kolmogorov-smirnov-p’(default) -
Warning
- Only fva methods have been tested
-
load_solutions
(obj, solution_ids)¶ loads mop_solutions from problem output folders.
Parameters: solution_ids (cell or string) – ids of mop_solution files.
-
piechart_deletion_frequency
(deletions, categories_in, varargin)¶ Pie chart of deletion distribution with category information.
- Args
- deletions(index).id (cell array): deletion ids categories(containers.Map()): maps ids to their respective category (e.g. subsystem). top(double, optional) figure_handle(object, optional):
Notes
- While containers.Map() seemed like a useful data structure, it does
- not integrate well with the rest of matlab.
-
piechart_deletion_frequency_w
(obj, sol_inds, varargin)¶ Wrapper for
piechart_deletion_frequency()
-
plot_2d_pf
(obj, n_clusters, solution_ind)¶ A 2 dimensional representation of the pareto front, representative solutions are determined through k-medoids clustering.
Parameters: - n_clusters (integer) – Number of clusters for k-medoids.
- solution_ind (integer, optional) – Index of the solution to be ploted, defaults to 1.
-
plot_compatibility
(obj, varargin)¶ Box plot of compatibility distributions accross paramters sets. Compatibility of a solution is the number of products with design objective above a certain threshold.
Parameters: - categorical_cutoff_wGCP_NGP (-) – build the categorical pareto front of solutions using the wGCP and NGP objectives, the default is 0.6.
- categorical_cutoff_sGCP (-) – build the categorical pareto front of solutions using the sGCP objective, the default is 0.36.
- plot_type (-) – ‘default’ (default), ‘no-ndesigns’,’inplot-ndesigns’.
- y_n_loc (-) – If the option ‘inplot-ndesigns’ is used, this controls the y position. Default is 0.4.
-
plot_design_tradeoff
(obj, design_ind, varargin)¶ Plots the objective value of the selected design(s), specified by design_ind, with respect to the maximum objective value for each objective.
- Usage:
- For two wGCP solutions, wGCP-10-0 and wGCP-10-3 loaded in obj, in that order, to plot wGCP-10-0-5 and wGCP-10-3-10 run: obj.plot_design_tradeoff([5,10],[1,2]).
Parameters: - design_ind (vector) – Indices of the solutions of designs to be plotted, each design corresponds to a solution specified in inputs.solution_ind.
- solution_ind (integer, optional) – Defaults to 1:length(design_ind).
- sort_solution (logical, optional) – Defaults to true, the objectives are sorted to improve readibility. If false, the order of the ra.consitent_solution is used
- plot_type (string, optional) – ‘overlap’(default), all solutions in one plot. ‘split’(one plot per solution); ‘split-bar’, one plot per solution using bar plot for maximum objective.
-
plot_fva_range
(fva_results, varargin)¶ Creates a visual plot of fva ranges comparing two models
- Args
- fva_results.rxns (cell array): reaction ids
- fva_results.maxflux_base (vector): vector of maximum fluxes for base model
- fva_results.minflux_base (vector)
- fva_results.maxflux_compare (vector)
- fva_results.minflux_compare (vector)
- sort_ind (vector, optional): A sorting vector for reactions. Default
- is sort by range.
- top(integer, optional): plots the top reactions from sort_ind,
- default is 20 first.
- Credits
- Adapted from cobra toolbox tutorial
-
plot_pareto_front
(obj, varargin)¶ Plots pareto front clustergram and related figures.
Parameters: - solution_ind_or_id (integer or string, optional) –
- plot_cpf (logical, optional) –
- save_cpf (string, optional) – ‘no’ (default), ‘heatmap’(saves a tables of 0s and 1s which can be used for a heatmap), ‘names’ (a real table version which lists the networks in each cpf).
- cpf_cutoffs (vector, optional) – A vector of objective values used to generate a categorical pareto front. Note that a scalar can also be provided. Default is 0.6.
- plot_pareto_set (logical, optional) – If true, the pareto set is plotted as a clustergram. Default false. plot_hetmap (logical, optional): If true, heatmaps plots are used instead of clustergrams. Defaults to false. save_to_emf (logical, optional): If true, saves the output to an enhanced metafile. Defaults to true. figure_size (vector of figure size, optional): Used in Matlab to specify figure size and location. By default Matlab will determine this.
Notes
- This instruction can be used to supress figure display: set(0,’DefaultFigureVisible’,’off’)
-
plot_pca
()¶ WIP
-
plot_yield_vs_growth
(obj, design_inds, varargin)¶ Generates a multiple plot of production envelopes (aka convex hull). Many aspects of the plot can be customized.
- Usage:
- Note that optional parameters must be entered as a string-value pair, e.g. obj.plot_yield_vs_growth([5,1],’min_obj_val’, 0.1).
Parameters: - design_inds (vector) – The length matches the numbers of solutions loaded in obj.solutions. Each entry corresponds to the index of a design to be plotted for the solution in the same possition. E.g. if wGCP-5-0 and wGCP-6-0 are loaded, desing_inds = [5,1] plots wGCP-5-0-5 and wGCP-6-0-1.
- plot_type (string, optional) – Two options, ‘matrix’(default) where each row is a design, or ‘overlap’ where designs share the same phenotypic space.
- min_obj_val (double, optional) – Products below this objective value are not plotted. Default is 0.
- yticks_values (vector, opional) – Default is choosen by Matlab.
- n_rows (string, optional) – For overlap plot. # rows
- n_cols (string, optional) – For overlap plot. # columns
- use_prod_name (logical, optional) – If true, product names are used for plot titles. By default(false) product ids are used.
- npoints (integer, optional) – Points used to sample the convex hull
- convHullLineWidthWT (double, optional) –
- convHullLineWidthMut (double, optional) –
- fill_space (logical,optional) – Defautl true, color the space inside the convex hull.
- wt_color (rgb triplet scaled from 0-1, optional) – e.g. [0,255,0]./255
- wt_line_color (rgb triplet scaled from 0-1, optional) –
- mut_color (rgb triplet scaled from 0-1, optional) –
- mut_line_color (rgb triplet scaled from 0-1, optional) –
- set_axis_front (logical, optional) – Default true, will put axis on top drawn figure content
- TODO –
- Document additional options.
- Consolidate with static method to avoid code duplication.
-
plot_yield_vs_growth_s
(model_array, ko_array, prod_id, solution_id, varargin)¶ - Generates a multiple plot of production envelopes (aka convex hull).
- Many aspects of the plot can be customized.
Parameters: - model_array (structure array) – A structure of cobra models.
- ko_array (structure array) –
The indices of this sturcture match those of the models.The only field is ko_array(i).designs(j).del, which is a cell array containing either reaction
deletions or gene deletions. i is corresponds to the model index, and j to the design. All models must have the same number of designs. If the deletion_type is ‘other’, ko_array(i).designs(j).ub and ko_array(i).designs(j).lb must be provided.- prod_id : cell of strings
- Names of the models in model_array.
- solution_id : cell of strings
- Names of the solutions in model index.
- deletion_type (string, optional) –
- Type of deletion. Default is ‘reaction’. Alternatives are ‘gene’ for gene deletions,
- and ‘other’, which will enforce ko_array(i).designs(j).ub and .lb.
- use_rates (logical, optional) –
- If true The product rate will be ploted vs growth rate, if
- if false(default), the product yield will be ploted vs growth rate.
- plot_type (string, optional) – Two options, ‘matrix’(default) where each row is a design, or ‘overlap’ where designs share the same phenotypic space.
- yticks_values (vector, opional) – Default is choosen by Matlab.
- n_rows (string, optional) – For overlap plot. # rows
- n_cols (string, optional) – For overlap plot. # columns
- npoints (integer, optional) – Points used to sample the convex hull
- convHullLineWidthWT (double, optional) –
- convHullLineWidthMut (double, optional) –
- fill_space (logical,optional) – Defautl true, color the space inside the convex hull.
- wt_color (rgb triplet scaled from 0-1, optional) – e.g. [0,255,0]./255
- wt_line_color (rgb triplet scaled from 0-1, optional) –
- mut_color (rgb triplet scaled from 0-1, optional) –
- mut_line_color (rgb triplet scaled from 0-1, optional) –
- set_axis_front (logical, optional) – Default true, will put axis on top drawn figure content
- Usage –
- Optional parameters must be entered as a string-value pair, e.g. obj.plot_yield_vs_growth([5,1],’min_obj_val’, 0.1).
- Module variables must be applied prior to input in the function.
(i.e. ko_array(i).designs(j).del should contain deletions of the original desing - module variables).
- Warning –
- The state of the model_array (i.e. deletions or lack thereof) will be
- considered as the wild type state. Thus it is
- Notes –
- This function is based on plot_yield_vs_growth.m but it is mean to be
- more flexible.
-
print_design
(obj, design_ind, varargin)¶ Displays deleted reactions for a certain solution. Also indicates if the deletion does not apply to a certain producion network (module reaction)
Parameters: - design_ind (int) – Index of the design to be printed.
- sol_ind (int, optional) – Index of the solution from which to draw design, default is 1.
- geneid2name (dict_path, optional) – The path to a two column csv file,
- first columns are gene ids and second column are gene names. (where) – default is ‘’.
- extra_rxns (cell array of reaction ids) – Additional reactions not in the design which will be included in the table. (useful for alternative solutions)
- is_alternative (logical, optional) – If true, an alternative solution of design_ind, specified by alternative_ind will be considered. Default is false.
- alternative_ind (double, optional) – Only relevant for alternative solutions (see is alternative). Index of the alternative solution, default is 1.
- verbose (logical, optional) – Weather or not to display the design. Default is true.
Returns: - T (table) – Design information
- deleted_reactions (cell array)
Notes
- Currently only supports reaction deletions.
-
remove_always_zero_prod
(obj, solution_ind)¶ Removes products (ignore for the analysis) which are always 0 in the feasible solutions corresponding to solution_ind.
Args: solution_ind (int): Index of the solution to be deleted
-
remove_products
(obj, prod_to_remove_id)¶ - Deletes production networks from obj.prodnet, so that they are ignored
for the analysis.
- Args:
- prod_to_remove_id (cell fo strings)
Prodnet already has a method for this
-
set_solution_state
(obj, sol_ind)¶ Returns a mop_solution and sets the obj.prodnet to the same state, in order to avoid side effects when analyzing that solution.
Parameters: sol_ind (integer) – Index of the solution to be retrieved.
-
similarity_plots
(obj, type, output_graph_name, corr_cutoff, correl_type, solution_ind_or_id)¶ Render correlation graphs for pareto front matrix.
Parameters: type (str) – Correlation coefficient ‘pearson’ or ‘spearman’. Notes
Because matlabs graph rendering features are terrible, the graph can be outputed to a csv for better drawing with high quality free software like cytoscape
-
stepwise_implementation
(obj, design_ind, varargin)¶ Explores all possible subsets of a solution and generates a report and a tree (cytoscape input) with the most promissing canidates to achieve the target design with useful designs on the way.
Parameters: - design_ind (integer) – Index of the design in the solution given by sol_ind to be analyzed.
- sol_ind (integer, optional) – Solution index, default 1.
- compatibiliy_cutoff (double, optional) – Value used to determine compatibiliy of a design, defaul is 0.6.
- max_level (integer, optional) – Number of levels in the implementation tree to explore. The default is up to the number of deleions in the given design minus 1. (e.g. if the given design has 5 deletions, subset designs with 1,2,3, and 4 deletions will be explored).
- write_tables (logical, optional) – If true (default is false) a table of
- sorted by compatibility, is written frol each level to (designs,) –
- + levelX.csv (output_base_path) –
- write_graph (logical, optional) – If true (default) a output_base_path(string, opional) : For ouput files, default is obj.prodnet.problem_path/output/<design-objective>-<design-deletitons>-<design-ind>
- min_obj_val (double, optional) – For output graph, products with objectives below min_obj_val in all designs will be removed. Default is 0.1.
- alt_sol_ind (integer, optional) – Index of an alternative_solution to design_ind. Default is 0 which indicates that no alternative solution is considered.
- only_nondominated (logical, optional) – If true only non-dominated
- at each step are kept. Default is false. (solutions) –
- Notes
- Module reactions are those of the final solution, so only one module
- needs to be constructed for all strains.
-
write_result_tables
(obj, varargin)¶ Generates a report for a given set of solutions
Parameters: - file_name (string, optional) – Name of the output file. Defaults to problem-name-report.
- skip_log – Indicates if the sheet containing a log of the solutions should be skipped. Defaults to false.
Notes
- Gene deletion report not suported yet
- All designs have to be growth type (i.e. wGCP,sGCP) or non-growth (NGP).
- Currently csv output does not include superheaders
-
write_to_xls
(obj, file_name, skip_log)¶ Generates a report for a given set of solutions
Parameters: - file_name (string, optional) – Name of the output file. Defaults to problem-name-report.
- skip_log (logical, optional) – Indicates if the sheet containing a log of the solutions should be skipped. Defaults to false.
Notes
- Gene deletion report not suported yet
- All designs have to be growth type (i.e. wGCP,sGCP) or non-growth (NGP).
Warning
- Depreciated function, use ResAnalysis.write_result_tables instead.