fnGO
Functions for Gene Ontology (GO) overrepresentation analysis using the communicated set of proteins.
- CoRe.fnGO.MinMaxGOsets(GO_embedding, GO_container, GO_BP_names)[source]
Identifies the minimum gene sets that do not contain any other gene sets and the maximum gene sets that are not contained in any other gene sets.
- Parameters
GO_embedding (dict) – Dict with the index of gene ontology set name as keys and the list of indices of the gene sets that contains the key gene set.
GO_container (dict) – Dict with the index of gene ontology set name as keys and the list of indices of the gene sets that are contained in the key gene set.
GO_BP_names (list) – Names of gene ontology gene sets.
- CoRe.fnGO.compute_p_values(sources, GO_BPs, interaction_set, total_genes, minimum_GOBP=False, size_threshold=inf, full=False)[source]
Benjamini-Hochberg p-value correction for multiple hypothesis testing.
- Parameters
sources (list) – Names of factors that are causing the information transfer in the network.
minimum_GOBP (list) – Names of gene sets for biological processes at the lowest level, i.e. these sets do not contain other gene sets.
GO_BPs (array_like) – Gene sets for Gene Ontology Biological Processes.
interaction_set (dict) – Set of genes receiving information from the sources.
total_genes (int) – Total number of unique genes across all gene sets.
- Returns
go_names (dict) – Gene Ontology Biological Processes that are over-represented by the sources.
p_values (dict) – Fisher’s exact test p-value for Gene Ontology over-represenation analysis.
- CoRe.fnGO.compute_q_values(p_values, go_names, go_tags, alpha=0.01, return_all=False)[source]
Benjamini-Hochberg p-value correction for multiple hypothesis testing.
- Parameters
p_values (array_like) – A list or array of p-values, from Fisher’s exact text, for multiple hypothesis.
- Returns
q_values (array_like) – Sorted positive False Discovery Rate corrected for multiple hypothesis testing.
p_values_go (array_like) – Names of the associated gene ontology biological processes.
- CoRe.fnGO.findGOcontainer(GO_embedding, outputfile)[source]
Identifies that gene sets that contains other gene sets. The gene sets that do not contain other gene sets are returned as dictionary keys with an empty list as the entry.
- Parameters
GO_sets (dict) – Dictionary with gene set index as keys and the list of gene set that contains it as entries.
outputfile (string) – Name of the file to store the output.
- Returns
GO_container – Dict with the index of gene ontology set name as keys and the list of indices of the gene sets that are contained in the key gene set.
- Return type
dict
- CoRe.fnGO.findGOembedding(GO_sets, outputfile)[source]
Identifies that gene sets that are embedded within other gene sets. The gene sets that are not embedded in any other gene set are returned as dictionary keys with an empty list as the entry.
- Parameters
GO_sets (dict) – Dictionary with gene set name as keys and the list of associated genes as entries.
outputfile (string) – Name of the file to store the output.
- Returns
GO_embedding – Dict with the index of gene ontology set name as keys and the list of indices of the gene sets that contains the key gene set.
- Return type
dict
- CoRe.fnGO.p_adjust_bh(p)[source]
Benjamini-Hochberg p-value correction for multiple hypothesis testing.
- Parameters
p (array_like) – A list or array of p-values, from Fisher’s exact text, for multiple hypothesis.
- Returns
q – Adjusted positive False Discovery Rate (pFDR).
- Return type
array_like
- CoRe.fnGO.readGOBPs(GO_directory)[source]
Reads the gene sets associated with Gene Ontology Biological Processes.
- Parameters
GO_directory – Directory containing the GO data files from Moleculary Signatures Database. This directory contains a set of .csv files with GOBP names as filenames containing the list of associated genes.
- Returns
GO_BPs – Dict with GOBP names as keys and the list of associated genes as the dictionary entry.
- Return type
dict
- CoRe.fnGO.readGOsets(GO_file, GO_category)[source]
Reads the gene ontology data set as a python dictionary.
- Parameters
GO_file (string) – Name of the file containing Gene Ontology gene sets.
GO_category (string) – Name of the gene ontology category, one among the three, ‘GOBP’, ‘GOCC’, or ‘GOMF’.
- Returns
GO_BPs (dict) – Dict with gene ontology gene set as keys and the list of associated genes as dictionary entries.
total_unique_genes (list) – Names of unique genes in the total gene ontology data set.
- CoRe.fnGO.total_genes(GOBPs)[source]
Determines the list of unique genes across all the GO gene sets.
- Parameters
GO_BPs (dict) – Dict with GOBP names as keys and the list of associated genes as the dictionary entry.
- Returns
unique_genes – Name of genes present in the gene set database.
- Return type
list