iORbase 2.0 - Help Documentation

A comprehensive guide to using the iORbase 2.0 platform for insect olfactory receptor analysis.

Contents

Sequence Analysis

Tools for analyzing olfactory receptor gene sequences, including identification, annotation, and comparative analysis.

iGene - Gene Analysis Module

A specialized framework for annotating olfactory receptors (OR) from insect genomes, based on the published insectOR database functionality. It serves as an all-in-one automated OR gene annotation tool to address inefficiencies of traditional manual analysis.

Overview

iGene is a bioinformatics tool designed to systematically identify and annotate olfactory receptor (OR) genes from insect genome sequences. The module integrates multiple computational approaches to provide reliable OR gene annotations, leveraging the comprehensive insectOR database as a reference resource. It supports input of custom user ID, task ID, and genome file (FASTA format), automatically executing three core processes:

  • Extraction of OR candidate regions via homology alignment
  • High-confidence OR gene annotation (including exon-intron structure prediction)
  • Result correction (3D structure prediction + transmembrane validation, where sequences with 6-8 transmembrane domains are considered complete ORs)

Core Advantages:

  • Full Automation: Executes the entire pipeline with one click, requiring no manual intervention
  • Resume from Interruption: Automatically detects completed steps, supports skipping or re-running
  • Transparent Logging: Real-time recording of execution status, time consumption, and error messages
  • Reliable Results: Integrates structure prediction and transmembrane validation to filter complete OR sequences

Key Features

  • Homology-based OR gene identification
  • Motif detection for OR-specific domains
  • Phylogenetic classification of identified ORs
  • Comprehensive annotation reports
  • Automated pipeline with error recovery
  • Transmembrane domain validation for result filtering

Installation

Prerequisites

System: Linux (Recommended: Ubuntu 20.04+ / CentOS 8+)

Installation Steps

Step 1: Open the Path

cd insector_IORBase

Step 2: Install Dependencies

Optional - Install screen (for running processes in background):

# For Ubuntu/Debian systems:
sudo apt install -y screen build-essential libncurses5-dev libbz2-dev liblzma-dev libssl-dev

Install Conda (for environment management):

wget https://repo.anaconda.com/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda2.sh
bash miniconda2.sh -b -p $HOME/miniconda2
echo 'export PATH="$HOME/miniconda2/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Create dedicated Conda environment and install core tools:

conda create -n iorbase-py2 python=2.7 -y
conda activate iorbase-py2
conda install -c bioconda blast=2.12.0 -y
conda install -c bioconda samtools=1.15 -y
conda install -c salilab modeller -y

Install Python 2.7 compatible libraries:

pip install biopython==1.76
pip install numpy==1.16.6

Additional Configuration (Modeller License):

echo 'export MODELLERKEY=your_license_key_here' >> ~/.bashrc
source ~/.bashrc

Step 3: Verify Installation

echo "Python version:" && python --version
echo "BLAST+ version:" && tblastn -version | head -1
echo "Modeller version:" && modeller -v

Usage Guide

Step 1: Prepare Input Files

Genome file requirements: FASTA format (e.g., GCA_XXXXXX.X_genomic.fna / GCF_XXXXXX.X_genomic.fna)

Step 2: Set Script Execution Permissions

chmod +x automate_insectOR_final.sh

Step 3: Core Parameters

  • -u <user_id>: Required. Custom user ID
  • -t <task_id>: Required. Custom task ID
  • -g <genome_file>: Required. Full path to input genome file
  • -s: Optional. Force skip completed steps

Step 4: Running the Pipeline

First run (with interactive prompt):

./automate_insectOR_final.sh -u entomology_lab -t honeybee_OR_2025 -g /data/honeybee_genome.fasta

Resume after interruption (force skip):

./automate_insectOR_final.sh -u entomology_lab -t honeybee_OR_2025 -g /data/honeybee_genome.fasta -s

Step 5: Locating Results

Results are organized in user_tasks/<user_id>/<task_id>/ with subdirectories:

  • 1-extract_candidates/: OR candidate regions (FASTA + BED files)
  • 2-annotation/: OR gene annotation results
  • 3-correction/: Corrected results (transmembrane validation)
  • 4-final_results/: Integrated final results + statistics

Troubleshooting

Python 2.7 not found:

conda activate iorbase-env
conda install python=2.7 -y

Modeller "Invalid license" error:

echo $MODELLERKEY
source ~/.bashrc

Acknowledgments

iGene integrates several open-source tools:

  • Sequence Analysis: BLAST+, HMMER, exonerate
  • Structure Prediction: Modeller, TMHMM, Phobius, HMMTOP
  • Sequence Processing: samtools, WISE
  • Environment Management: Conda

Structure Analysis

Structural models of large-scale insect OR were conducted using alphafold, and the results of gene annotation were tested through structural integrity.

iORPDB - OR Structure Database

An OR structure database containing information on insect species.

Overview

iORPDB provides large-scale insect OR structures and offers structural visualization as well as information such as the species, sequence, and types (OR / ORco) to which the OR belongs.

Key Features

  • Blast based search capabilities
  • Structure visualization tools

iModelTM - Transmembrane Modeling

A specialized tool for modeling the transmembrane domains of olfactory receptors, focusing on the unique structural features of this protein family.

Overview

iModelTM uses template based algorithms to predict the transmembrane helix arrangement of olfactory receptors.

Key Features

  • Transmembrane domain prediction
  • Helix arrangement modeling
  • Sequence similarity assessment

Function Analysis

Tools for studying the functional properties of olfactory receptors, including online docking, odor response prediction, and molecular dynamics result analysis.

OdorSets - Odor Ligand Database

A database of compounds containing a variety of different odor sources.

Overview

Compound information from various sources such as insect pheromones, environmental odors and plant volatiles was collected, including SMILES, CID and various physical and chemical properties.

Key Features

  • Representing different ecological environments
  • A wide chemical space coverage

DockingSets - Docking Results Database

A large-scale iOR-VOC interaction prediction database based on molecular docking methods.

Overview

DockingSets contains millions of pre-computed docking poses between olfactory receptors and odor ligands, allowing users to quickly explore potential binding interactions without performing computationally intensive docking simulations.

Key Features

  • Large-scale virtual screening data statistics
  • The binding profile of a single receptor or ligand

iDock - Molecular Docking Tool

A user-friendly interface for performing molecular docking simulations between olfactory receptors and odor ligands.

Overview

iDock provides an intuitive platform for setting up and running molecular docking simulations, with pre-configured parameters optimized for olfactory receptor-ligand interactions.

Key Features

  • User-friendly docking setup
  • Optimized parameters for OR-ligand interactions
  • Interactive result analysis and visualization

iORPred - Odor Response Prediction

A machine learning-based tool for predicting the odor response profile of olfactory receptors.

Overview

iORPred uses advanced machine learning algorithms to predict how olfactory receptors will respond to different odor ligands, based on sequence features.

Key Features

  • Odor response prediction
  • Fast batch prediction

Applications - Integrated Analysis Workflows

For the specific problems in the iOR function research, provide a system construction for large-scale virtual screening.

Overview

Provide corresponding virtual screening systems for two problems: reverse target finding (finding receptors based on known ligands) and agonist screening (finding ligands based on known receptors).

Key Features

  • Virtual screening systems
  • Step-by-step guidance
  • Automated data processing

iMDanalysis - Molecular Dynamics Analysis

A tool for analyzing molecular dynamics simulation results of iORs, focusing on structural dynamics and flexibility.

Overview

This tutorial uses the molecular dynamics (MD) simulation analysis of Drosophila melanogaster OR85b and the ligand methyl acetate as an example to illustrate the analysis workflow.

This document is intended for users who have completed MD simulations and have used GROMACS and the GetContacts tool. It provides a standardized guide for RMSD, RMSF, and residue–ligand contact-frequency analyses. For each of the three functions, the required input files, GROMACS commands, Python script configuration, and expected outputs are described below.


I. RMSD Analysis (rmsd.py)

Overview

RMSD (Root Mean Square Deviation) measures how much the system’s structure deviates from a reference conformation over time. This function reads the RMSD time series from a GROMACS-generated .xvg file and outputs a time-series table and a statistical summary, which are convenient for plotting and report writing


Prerequisites

Prepare the following GROMACS output files in advance:

  • (1) pbc.xtc: trajectory file after periodic boundary condition (PBC) processing
  • (2) md.tpr: topology/run input file for RMSD calculation (contains reference structure information)

GROMACS Command

Run the following in a terminal:

gmx rms -f pbc.xtc -s md.tpr -o rmsd_protein.xvg

Parameters:

  • -f pbc.xtc: specify the input trajectory file
  • -s md.tpr: specify the reference topology/structure file
  • -o rmsd_protein.xvg: output the RMSD-vs-time curve

After the command finishes, the following file will be generated:

rmsd_protein.xvg
Python Script Configuration
./RMSD/rmsd.py

Modify the path-related configurations in the script according to your actual file locations:

Line 9: set the RMSD input .xvg path

RMSD_XVG_PATH="./rmsd_protein.xvg"

Line 12: set the output directory

OUTPUT_DIR="./RMSD/output"

Notes:

  • Please ensure RMSD_XVG_PATH points to the actual generated .xvg file;
  • OUTPUT_DIR should already exist, or the script should contain logic to create it automatically (please verify based on your script).

Run command:

python ./RMSD/rmsd.py

*Run this command under the iORbase_MDanalysis directory.


Script Outputs

After running rmsd.py, the following outputs are expected (under OUTPUT_DIR):

rmsd_summary.txt rmsd_timeseries.tsv

II. RMSF Analysis (rmsf.py)

Overview

RMSF (Root Mean Square Fluctuation) measures the fluctuation amplitude of each residue over time. This function reads per-residue RMSF values from a GROMACS .xvg file and outputs an RMSF profile and a statistical summary to help identify flexible regions and key structural regions.


Prerequisites

Prepare the following GROMACS output files in advance:

  • (1) pbc.xtc: trajectory file after periodic boundary condition (PBC) processing
  • (2) md.tpr: topology/run input file for RMSF calculation (contains reference structure information)

GROMACS Command

Run the following in a terminal:

gmx rmsf -f pbc.xtc -s md.tpr -o rmsf_protein.xvg -oq bfac.pdb -res

Parameters:

  • -o rmsf_protein.xvg: output RMSF values for each residue
  • -oq bfac.pdb: output a PDB file with B-factor information for RMSF visualization
  • -res: calculate RMSF per residue rather than per atom

After the command finishes, the following file will be generated:

rmsf_protein.xvg bfac.pdb
Python Script Configuration
./RMSD/rmsf.py

Modify the path-related configurations in the script according to your actual file locations:

Line 9: set the RMSF input .xvg path

RMSF_XVG_PATH="./rmsf_protein.xvg"

Line 12: set the output directory

OUTPUT_DIR="./RMSF/output"

Run command:

python ./RMSD/rmsf.py

*Run this command under the iORbase_MDanalysis directory.


Script Outputs

After running rmsf.py, the following outputs are expected (under OUTPUT_DIR):

rmsf_profile.tsv
rmsf_summary.txt

III. Residue Contact Frequency Statistics (frequency.py)

Overview

This function uses the GetContacts tool to compute residue–ligand contact frequencies, summarizes contact patterns between different ORs (olfactory receptors) and ligands, and generates detailed per-OR contact-frequency tables and an overall summary. This analysis helps identify key residues closely involved in ligand binding.


Prerequisites

Prepare the following files in advance:

  • (1) md.gro: topology/structure file of the system
  • (2) pbc.xtc: trajectory file after PBC processing

GetContacts Installation

Install Python dependencies with Conda:

conda create -n getcontacts python=3.10 -y
conda activate getcontacts

Install dependencies (prefer conda-forge):

conda install -c conda-forge vmd-python -y
conda install -c conda-forge tk=8.6 netcdf4 numpy scipy expat matplotlib scikit-learn pytest pandas seaborn cython -y

Install system-level dependencies

sudo apt install -y libnetcdf-dev

Install GetContacts

Use the local installation package:

unzip getcontacts.zip
mv getcontacts-master getcontacts cd getcontacts
python get_dynamic_contacts.py --help

Add the script to PATH:

echo 'export PATH="$PATH:iORbase_MDanalysis/getcontacts"' >> ~/.bashrc
source ~/.bashrc

Grant execution permission:

chmod +x get_dynamic_contacts.py

Verify installation

conda activate getcontacts
get_dynamic_contacts.py --help

If the help message prints normally, the installation is successful.

GetContacts Command (Generate Contact File)

Before running GetContacts, change to the GetContacts directory:

cd ./getcontacts

Run in a terminal:

python get_dynamic_contacts.py \
--topology ../md.gro \
--trajectory ../pbc.xtc \
--output ../Frequency/loop-ligand_all.tsv \
--cores 10 \
--sele "protein" \
--sele2 "resname 59b" \
--itypes all

Parameter description:

--topology ../md.gro: specify the topology/structure file;

--trajectory ../pbc.xtc: specify the trajectory file;

--output ./Frequency/drosophila/59b3orco_ma2/loop-ligand_all.tsv: output contact-frequency data file (here it is written directly under the Frequency folder; adjust as needed);

--cores 10: number of threads; adjust based on your machine;

--sele "protein": select the first interacting object (protein);

--sele2 "resname 59b": select the second interacting object (ligand residue name).

You must modify this according to the actual ligand residue name.

For example, if the file name is 59b3orco, the ligand residue name is 59b, then set:

--sele2 "resname 59b"

--itypes all: count all interaction types (hydrogen bonds, hydrophobic interactions, salt bridges, etc.).

After the command finishes, the following file will be generated in the corresponding directory (../Frequency):

loop-ligand_all.tsv

Python Script Configuration (frequency.py)

Before use, modify the following path settings according to your actual directory structure:

Line 9: input file path

INPUT_TSV_PATH = "./loop-ligand_all.tsv"

Line 10: output file path

OUTPUT_TSV_PATH = "./residue_ligand_contacts.tsv"
Python Script Parameters
python frequency.py
Script Outputs

After running frequency.py, the following output will be produced:

residue_ligand_contacts.tsv

About

Information about the iORbase 2.0 platform, including statistics, citation guidelines, and contact information.

Statistics

Under construction...

Citation

Under construction...