iORbase 2.0 - Help Documentation
A comprehensive guide to using the iORbase 2.0 platform for insect olfactory receptor analysis.
Contents
Sequence Analysis
Tools for analyzing olfactory receptor gene sequences, including identification, annotation, and comparative analysis.
iGene - Gene Analysis Module
A specialized framework for annotating olfactory receptors (OR) from insect genomes, based on the published insectOR database functionality. It serves as an all-in-one automated OR gene annotation tool to address inefficiencies of traditional manual analysis.
Overview
iGene is a bioinformatics tool designed to systematically identify and annotate olfactory receptor (OR) genes from insect genome sequences. The module integrates multiple computational approaches to provide reliable OR gene annotations, leveraging the comprehensive insectOR database as a reference resource. It supports input of custom user ID, task ID, and genome file (FASTA format), automatically executing three core processes:
- Extraction of OR candidate regions via homology alignment
- High-confidence OR gene annotation (including exon-intron structure prediction)
- Result correction (3D structure prediction + transmembrane validation, where sequences with 6-8 transmembrane domains are considered complete ORs)
Core Advantages:
- Full Automation: Executes the entire pipeline with one click, requiring no manual intervention
- Resume from Interruption: Automatically detects completed steps, supports skipping or re-running
- Transparent Logging: Real-time recording of execution status, time consumption, and error messages
- Reliable Results: Integrates structure prediction and transmembrane validation to filter complete OR sequences
Key Features
- Homology-based OR gene identification
- Motif detection for OR-specific domains
- Phylogenetic classification of identified ORs
- Comprehensive annotation reports
- Automated pipeline with error recovery
- Transmembrane domain validation for result filtering
Installation
Prerequisites
System: Linux (Recommended: Ubuntu 20.04+ / CentOS 8+)
Installation Steps
Step 1: Open the Path
cd insector_IORBase
Step 2: Install Dependencies
Optional - Install screen (for running processes in background):
# For Ubuntu/Debian systems:
sudo apt install -y screen build-essential libncurses5-dev libbz2-dev liblzma-dev libssl-dev
Install Conda (for environment management):
wget https://repo.anaconda.com/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda2.sh
bash miniconda2.sh -b -p $HOME/miniconda2
echo 'export PATH="$HOME/miniconda2/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
Create dedicated Conda environment and install core tools:
conda create -n iorbase-py2 python=2.7 -y
conda activate iorbase-py2
conda install -c bioconda blast=2.12.0 -y
conda install -c bioconda samtools=1.15 -y
conda install -c salilab modeller -y
Install Python 2.7 compatible libraries:
pip install biopython==1.76
pip install numpy==1.16.6
Additional Configuration (Modeller License):
echo 'export MODELLERKEY=your_license_key_here' >> ~/.bashrc
source ~/.bashrc
Step 3: Verify Installation
echo "Python version:" && python --version
echo "BLAST+ version:" && tblastn -version | head -1
echo "Modeller version:" && modeller -v
Usage Guide
Step 1: Prepare Input Files
Genome file requirements: FASTA format (e.g., GCA_XXXXXX.X_genomic.fna / GCF_XXXXXX.X_genomic.fna)
Step 2: Set Script Execution Permissions
chmod +x automate_insectOR_final.sh
Step 3: Core Parameters
-u <user_id>: Required. Custom user ID-t <task_id>: Required. Custom task ID-g <genome_file>: Required. Full path to input genome file-s: Optional. Force skip completed steps
Step 4: Running the Pipeline
First run (with interactive prompt):
./automate_insectOR_final.sh -u entomology_lab -t honeybee_OR_2025 -g /data/honeybee_genome.fasta
Resume after interruption (force skip):
./automate_insectOR_final.sh -u entomology_lab -t honeybee_OR_2025 -g /data/honeybee_genome.fasta -s
Step 5: Locating Results
Results are organized in user_tasks/<user_id>/<task_id>/ with subdirectories:
1-extract_candidates/: OR candidate regions (FASTA + BED files)2-annotation/: OR gene annotation results3-correction/: Corrected results (transmembrane validation)4-final_results/: Integrated final results + statistics
Troubleshooting
Python 2.7 not found:
conda activate iorbase-env
conda install python=2.7 -y
Modeller "Invalid license" error:
echo $MODELLERKEY
source ~/.bashrc
Acknowledgments
iGene integrates several open-source tools:
- Sequence Analysis: BLAST+, HMMER, exonerate
- Structure Prediction: Modeller, TMHMM, Phobius, HMMTOP
- Sequence Processing: samtools, WISE
- Environment Management: Conda
Structure Analysis
Structural models of large-scale insect OR were conducted using alphafold, and the results of gene annotation were tested through structural integrity.
iORPDB - OR Structure Database
An OR structure database containing information on insect species.
Overview
iORPDB provides large-scale insect OR structures and offers structural visualization as well as information such as the species, sequence, and types (OR / ORco) to which the OR belongs.
Key Features
- Blast based search capabilities
- Structure visualization tools
iModelTM - Transmembrane Modeling
A specialized tool for modeling the transmembrane domains of olfactory receptors, focusing on the unique structural features of this protein family.
Overview
iModelTM uses template based algorithms to predict the transmembrane helix arrangement of olfactory receptors.
Key Features
- Transmembrane domain prediction
- Helix arrangement modeling
- Sequence similarity assessment
Function Analysis
Tools for studying the functional properties of olfactory receptors, including online docking, odor response prediction, and molecular dynamics result analysis.
OdorSets - Odor Ligand Database
A database of compounds containing a variety of different odor sources.
Overview
Compound information from various sources such as insect pheromones, environmental odors and plant volatiles was collected, including SMILES, CID and various physical and chemical properties.
Key Features
- Representing different ecological environments
- A wide chemical space coverage
DockingSets - Docking Results Database
A large-scale iOR-VOC interaction prediction database based on molecular docking methods.
Overview
DockingSets contains millions of pre-computed docking poses between olfactory receptors and odor ligands, allowing users to quickly explore potential binding interactions without performing computationally intensive docking simulations.
Key Features
- Large-scale virtual screening data statistics
- The binding profile of a single receptor or ligand
iDock - Molecular Docking Tool
A user-friendly interface for performing molecular docking simulations between olfactory receptors and odor ligands.
Overview
iDock provides an intuitive platform for setting up and running molecular docking simulations, with pre-configured parameters optimized for olfactory receptor-ligand interactions.
Key Features
- User-friendly docking setup
- Optimized parameters for OR-ligand interactions
- Interactive result analysis and visualization
iORPred - Odor Response Prediction
A machine learning-based tool for predicting the odor response profile of olfactory receptors.
Overview
iORPred uses advanced machine learning algorithms to predict how olfactory receptors will respond to different odor ligands, based on sequence features.
Key Features
- Odor response prediction
- Fast batch prediction
Applications - Integrated Analysis Workflows
For the specific problems in the iOR function research, provide a system construction for large-scale virtual screening.
Overview
Provide corresponding virtual screening systems for two problems: reverse target finding (finding receptors based on known ligands) and agonist screening (finding ligands based on known receptors).
Key Features
- Virtual screening systems
- Step-by-step guidance
- Automated data processing
iMDanalysis - Molecular Dynamics Analysis
A tool for analyzing molecular dynamics simulation results of iORs, focusing on structural dynamics and flexibility.
Overview
This tutorial uses the molecular dynamics (MD) simulation analysis of Drosophila melanogaster OR85b and the ligand methyl acetate as an example to illustrate the analysis workflow.
This document is intended for users who have completed MD simulations and have used GROMACS and the GetContacts tool. It provides a standardized guide for RMSD, RMSF, and residue–ligand contact-frequency analyses. For each of the three functions, the required input files, GROMACS commands, Python script configuration, and expected outputs are described below.
I. RMSD Analysis (rmsd.py)
Overview
RMSD (Root Mean Square Deviation) measures how much the system’s structure deviates from a reference conformation over time. This function reads the RMSD time series from a GROMACS-generated .xvg file and outputs a time-series table and a statistical summary, which are convenient for plotting and report writing
Prerequisites
Prepare the following GROMACS output files in advance:
- (1)
pbc.xtc: trajectory file after periodic boundary condition (PBC) processing - (2)
md.tpr: topology/run input file for RMSD calculation (contains reference structure information)
GROMACS Command
Run the following in a terminal:
gmx rms -f pbc.xtc -s md.tpr -o rmsd_protein.xvg
Parameters:
-f pbc.xtc: specify the input trajectory file-s md.tpr: specify the reference topology/structure file-o rmsd_protein.xvg: output the RMSD-vs-time curve
After the command finishes, the following file will be generated:
rmsd_protein.xvg
Python Script Configuration
./RMSD/rmsd.py
Modify the path-related configurations in the script according to your actual file locations:
Line 9: set the RMSD input .xvg path
RMSD_XVG_PATH="./rmsd_protein.xvg"
Line 12: set the output directory
OUTPUT_DIR="./RMSD/output"
Notes:
- Please ensure RMSD_XVG_PATH points to the actual generated .xvg file;
- OUTPUT_DIR should already exist, or the script should contain logic to create it automatically (please verify based on your script).
Run command:
python ./RMSD/rmsd.py
*Run this command under the iORbase_MDanalysis directory.
Script Outputs
After running rmsd.py, the following outputs are expected (under OUTPUT_DIR):
rmsd_summary.txt
rmsd_timeseries.tsv
II. RMSF Analysis (rmsf.py)
Overview
RMSF (Root Mean Square Fluctuation) measures the fluctuation amplitude of each residue over time. This function reads per-residue RMSF values from a GROMACS .xvg file and outputs an RMSF profile and a statistical summary to help identify flexible regions and key structural regions.
Prerequisites
Prepare the following GROMACS output files in advance:
- (1)
pbc.xtc: trajectory file after periodic boundary condition (PBC) processing - (2)
md.tpr: topology/run input file for RMSF calculation (contains reference structure information)
GROMACS Command
Run the following in a terminal:
gmx rmsf -f pbc.xtc -s md.tpr -o rmsf_protein.xvg -oq bfac.pdb -res
Parameters:
-o rmsf_protein.xvg: output RMSF values for each residue-oq bfac.pdb: output a PDB file with B-factor information for RMSF visualization-res: calculate RMSF per residue rather than per atom
After the command finishes, the following file will be generated:
rmsf_protein.xvg
bfac.pdb
Python Script Configuration
./RMSD/rmsf.py
Modify the path-related configurations in the script according to your actual file locations:
Line 9: set the RMSF input .xvg path
RMSF_XVG_PATH="./rmsf_protein.xvg"
Line 12: set the output directory
OUTPUT_DIR="./RMSF/output"
Run command:
python ./RMSD/rmsf.py
*Run this command under the iORbase_MDanalysis directory.
Script Outputs
After running rmsf.py, the following outputs are expected (under OUTPUT_DIR):
rmsf_profile.tsvrmsf_summary.txt
III. Residue Contact Frequency Statistics (frequency.py)
Overview
This function uses the GetContacts tool to compute residue–ligand contact frequencies, summarizes contact patterns between different ORs (olfactory receptors) and ligands, and generates detailed per-OR contact-frequency tables and an overall summary. This analysis helps identify key residues closely involved in ligand binding.
Prerequisites
Prepare the following files in advance:
- (1)
md.gro: topology/structure file of the system - (2)
pbc.xtc: trajectory file after PBC processing
GetContacts Installation
Install Python dependencies with Conda:
conda create -n getcontacts python=3.10 -y conda activate getcontacts
Install dependencies (prefer conda-forge):
conda install -c conda-forge vmd-python -yconda install -c conda-forge tk=8.6 netcdf4 numpy scipy expat matplotlib scikit-learn pytest pandas seaborn cython -y
Install system-level dependencies
sudo apt install -y libnetcdf-devInstall GetContacts
Use the local installation package:
unzip getcontacts.zipmv getcontacts-master getcontacts
cd getcontactspython get_dynamic_contacts.py --help
Add the script to PATH:
echo 'export PATH="$PATH:iORbase_MDanalysis/getcontacts"' >> ~/.bashrcsource ~/.bashrc
Grant execution permission:
chmod +x get_dynamic_contacts.py
Verify installation
conda activate getcontactsget_dynamic_contacts.py --help
If the help message prints normally, the installation is successful.
GetContacts Command (Generate Contact File)
Before running GetContacts, change to the GetContacts directory:
cd ./getcontacts
Run in a terminal:
python get_dynamic_contacts.py \ --topology ../md.gro \ --trajectory ../pbc.xtc \ --output ../Frequency/loop-ligand_all.tsv \ --cores 10 \ --sele "protein" \ --sele2 "resname 59b" \ --itypes allParameter description:
--topology ../md.gro: specify the topology/structure file;
--trajectory ../pbc.xtc: specify the trajectory file;
--output ./Frequency/drosophila/59b3orco_ma2/loop-ligand_all.tsv: output contact-frequency data file (here it is written directly under the Frequency folder; adjust as needed);
--cores 10: number of threads; adjust based on your machine;
--sele "protein": select the first interacting object (protein);
--sele2 "resname 59b": select the second interacting object (ligand residue name).
You must modify this according to the actual ligand residue name.
For example, if the file name is 59b3orco, the ligand residue name is 59b, then set:
--sele2 "resname 59b"
--itypes all: count all interaction types (hydrogen bonds, hydrophobic interactions, salt bridges, etc.).
After the command finishes, the following file will be generated in the corresponding directory (../Frequency):
loop-ligand_all.tsv
Python Script Configuration (frequency.py)
Before use, modify the following path settings according to your actual directory structure:
Line 9: input file path
INPUT_TSV_PATH = "./loop-ligand_all.tsv"
Line 10: output file path
OUTPUT_TSV_PATH = "./residue_ligand_contacts.tsv"
Python Script Parameters
python frequency.py
Script Outputs
After running frequency.py, the following output will be produced:
residue_ligand_contacts.tsv
About
Information about the iORbase 2.0 platform, including statistics, citation guidelines, and contact information.
Statistics
Under construction...
Citation
Under construction...