On this tutorial, we discover the right way to leverage the Pybel ecosystem to assemble and analyze wealthy organic information graphs immediately inside Google Colab. We start by putting in all mandatory packages, together with PyBEL, NetworkX, Matplotlib, Seaborn, and Pandas. We then display the right way to outline proteins, processes, and modifications utilizing the PyBEL DSL. From there, we information you thru the creation of an Alzheimer’s disease-related pathway, showcasing the right way to encode causal relationships, protein–protein interactions, and phosphorylation occasions. Alongside graph building, we introduce superior community analyses, together with centrality measures, node classification, and subgraph extraction, in addition to methods for extracting quotation and proof knowledge. By the top of this part, you’ll have a totally annotated BEL graph prepared for downstream visualization and enrichment analyses, laying a stable basis for interactive organic information exploration.
!pip set up pybel pybel-tools networkx matplotlib seaborn pandas -q
import pybel
import pybel.dsl as dsl
from pybel import BELGraph
from pybel.io import to_pickle, from_pickle
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from collections import Counter
import warnings
warnings.filterwarnings('ignore')
print("PyBEL Superior Tutorial: Organic Expression Language Ecosystem")
print("=" * 65)
We start by putting in PyBEL and its dependencies immediately in Colab, guaranteeing that every one mandatory libraries, NetworkX, Matplotlib, Seaborn, and Pandas, can be found for our evaluation. As soon as put in, we import the core modules and suppress warnings to maintain our pocket book clear and targeted on the outcomes.
print("n1. Constructing a Organic Data Graph")
print("-" * 40)
graph = BELGraph(
title="Alzheimer's Illness Pathway",
model="1.0.0",
description="Instance pathway displaying protein interactions in AD",
authors="PyBEL Tutorial"
)
app = dsl.Protein(title="APP", namespace="HGNC")
abeta = dsl.Protein(title="Abeta", namespace="CHEBI")
tau = dsl.Protein(title="MAPT", namespace="HGNC")
gsk3b = dsl.Protein(title="GSK3B", namespace="HGNC")
irritation = dsl.BiologicalProcess(title="inflammatory response", namespace="GO")
apoptosis = dsl.BiologicalProcess(title="apoptotic course of", namespace="GO")
graph.add_increases(app, abeta, quotation="PMID:12345678", proof="APP cleavage produces Abeta")
graph.add_increases(abeta, irritation, quotation="PMID:87654321", proof="Abeta triggers neuroinflammation")
tau_phosphorylated = dsl.Protein(title="MAPT", namespace="HGNC",
variants=(dsl.ProteinModification("Ph")))
graph.add_increases(gsk3b, tau_phosphorylated, quotation="PMID:11111111", proof="GSK3B phosphorylates tau")
graph.add_increases(tau_phosphorylated, apoptosis, quotation="PMID:22222222", proof="Hyperphosphorylated tau causes cell dying")
graph.add_increases(irritation, apoptosis, quotation="PMID:33333333", proof="Irritation promotes apoptosis")
graph.add_association(abeta, tau, quotation="PMID:44444444", proof="Abeta and tau work together synergistically")
print(f"Created BEL graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")
We initialize a BELGraph with metadata for an Alzheimer’s illness pathway and outline proteins and processes utilizing the PyBEL DSL. By including causal relationships, protein modifications, and associations, we assemble a complete community that captures key molecular interactions.
print("n2. Superior Community Evaluation")
print("-" * 30)
degree_centrality = nx.degree_centrality(graph)
betweenness_centrality = nx.betweenness_centrality(graph)
closeness_centrality = nx.closeness_centrality(graph)
most_central = max(degree_centrality, key=degree_centrality.get)
print(f"Most linked node: {most_central}")
print(f"Diploma centrality: {degree_centrality(most_central):.3f}")
We compute diploma, betweenness, and closeness centralities to quantify every node’s significance inside the graph. By figuring out probably the most linked nodes, we acquire perception into potential hubs that will drive illness mechanisms.
print("n3. Organic Entity Classification")
print("-" * 35)
node_types = Counter()
for node in graph.nodes():
node_types(node.operate) += 1
print("Node distribution:")
for func, rely in node_types.objects():
print(f" {func}: {rely}")
We classify every node by its operate, comparable to Protein or BiologicalProcess, and tally their counts. This breakdown helps us perceive the composition of our community at a look.
print("n4. Pathway Evaluation")
print("-" * 20)
proteins = (node for node in graph.nodes() if node.operate == 'Protein')
processes = (node for node in graph.nodes() if node.operate == 'BiologicalProcess')
print(f"Proteins in pathway: {len(proteins)}")
print(f"Organic processes: {len(processes)}")
edge_types = Counter()
for u, v, knowledge in graph.edges(knowledge=True):
edge_types(knowledge.get('relation')) += 1
print("nRelationship sorts:")
for rel, rely in edge_types.objects():
print(f" {rel}: {rely}")
We separate all proteins and processes to measure the pathway’s scope and complexity. Counting the completely different relationship sorts additional reveals which interactions, like will increase or associations, dominate our mannequin.
print("n5. Literature Proof Evaluation")
print("-" * 32)
citations = ()
evidences = ()
for _, _, knowledge in graph.edges(knowledge=True):
if 'quotation' in knowledge:
citations.append(knowledge('quotation'))
if 'proof' in knowledge:
evidences.append(knowledge('proof'))
print(f"Complete citations: {len(citations)}")
print(f"Distinctive citations: {len(set(citations))}")
print(f"Proof statements: {len(evidences)}")
We extract quotation identifiers and proof strings from every edge to guage our graph’s grounding in revealed analysis. Summarizing complete and distinctive citations permits us to evaluate the breadth of supporting literature.
print("n6. Subgraph Evaluation")
print("-" * 22)
inflammation_nodes = (irritation)
inflammation_neighbors = checklist(graph.predecessors(irritation)) + checklist(graph.successors(irritation))
inflammation_subgraph = graph.subgraph(inflammation_nodes + inflammation_neighbors)
print(f"Irritation subgraph: {inflammation_subgraph.number_of_nodes()} nodes, {inflammation_subgraph.number_of_edges()} edges")
We isolate the irritation subgraph by amassing its direct neighbors, yielding a targeted view of inflammatory crosstalk. This focused subnetwork highlights how irritation interfaces with different illness processes.
print("n7. Superior Graph Querying")
print("-" * 28)
attempt:
paths = checklist(nx.all_simple_paths(graph, app, apoptosis, cutoff=3))
print(f"Paths from APP to apoptosis: {len(paths)}")
if paths:
print(f"Shortest path size: {len(paths(0))-1}")
besides nx.NetworkXNoPath:
print("No paths discovered between APP and apoptosis")
apoptosis_inducers = checklist(graph.predecessors(apoptosis))
print(f"Elements that improve apoptosis: {len(apoptosis_inducers)}")
We enumerate easy paths between APP and apoptosis to discover mechanistic routes and establish key intermediates. Itemizing all predecessors of apoptosis additionally reveals us which components could set off cell dying.
print("n8. Information Export and Visualization")
print("-" * 35)
adj_matrix = nx.adjacency_matrix(graph)
node_labels = (str(node) for node in graph.nodes())
plt.determine(figsize=(12, 8))
plt.subplot(2, 2, 1)
pos = nx.spring_layout(graph, okay=2, iterations=50)
nx.draw(graph, pos, with_labels=False, node_color="lightblue",
node_size=1000, font_size=8, font_weight="daring")
plt.title("BEL Community Graph")
plt.subplot(2, 2, 2)
centralities = checklist(degree_centrality.values())
plt.hist(centralities, bins=10, alpha=0.7, coloration="inexperienced")
plt.title("Diploma Centrality Distribution")
plt.xlabel("Centrality")
plt.ylabel("Frequency")
plt.subplot(2, 2, 3)
capabilities = checklist(node_types.keys())
counts = checklist(node_types.values())
plt.pie(counts, labels=capabilities, autopct="%1.1f%%", startangle=90)
plt.title("Node Sort Distribution")
plt.subplot(2, 2, 4)
relations = checklist(edge_types.keys())
rel_counts = checklist(edge_types.values())
plt.bar(relations, rel_counts, coloration="orange", alpha=0.7)
plt.title("Relationship Sorts")
plt.xlabel("Relation")
plt.ylabel("Depend")
plt.xticks(rotation=45)
plt.tight_layout()
plt.present()
We put together adjacency matrices and node labels for downstream use and generate a multi-panel determine displaying the community construction, centrality distributions, node-type proportions, and edge-type counts. These visualizations convey our BEL graph to life, supporting a deeper organic interpretation.
On this tutorial, we’ve got demonstrated the facility and adaptability of PyBEL for modeling advanced organic techniques. We confirmed how simply one can assemble a curated white-box graph of Alzheimer’s illness interactions, carry out network-level analyses to establish key hub nodes, and extract biologically significant subgraphs for targeted examine. We additionally lined important practices for literature proof mining and ready knowledge buildings for compelling visualizations. As a subsequent step, we encourage you to increase this framework to your pathways, integrating further omics knowledge, operating enrichment checks, or coupling the graph with machine-learning workflows.
Take a look at the Codes right here. All credit score for this analysis goes to the researchers of this venture. Additionally, be at liberty to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter.

Sana Hassan, a consulting intern at Marktechpost and dual-degree pupil at IIT Madras, is captivated with making use of expertise and AI to handle real-world challenges. With a eager curiosity in fixing sensible issues, he brings a contemporary perspective to the intersection of AI and real-life options.
