AD-scRNA2QSAR

UI

🌐 Overview

Welcome to our cutting-edge computational pipeline designed to accelerate Alzheimer’s Disease (AD) research. This project integrates advanced bioinformatics and cheminformatics, creating a seamless workflow from raw single-cell RNA sequencing (scRNA-seq) data to predictive Quantitative Structure-Activity Relationship (QSAR) modeling.

Our mission is to democratize access to powerful predictive tools, lowering the barrier to entry for researchers in the neurodegenerative disease space. This repository provides a comprehensive toolkit for data integration, cellular analysis, and machine learning-based bioactivity prediction.

You can access and use the live application at: https://QSARify.com

border

πŸ”— Project Workflow

UI

border

✨ Features

This pipeline is organized into three core modules, each providing a distinct set of functionalities.

πŸ”¬ Single-Cell Analysis

πŸ§ͺ QSAR Modeling & Bioactivity Prediction

🌐 Interactive Web API

border

UI

border

πŸ“‚ Project Structure

AlzheimerDisease_FromSingleCell/
β”œβ”€β”€ SingleCell/                # 🧬 scRNA-seq preprocessing (R)
β”‚   β”œβ”€β”€ Merge_Data.R
β”‚   └── SingleCell_Main.R
β”œβ”€β”€ MiloR/                     # 🧬 Differential-abundance analysis (R)
β”‚   └── MiloR_CellAbundance.R
β”œβ”€β”€ CellChat/                  # 🧬 Cell–cell communication analysis (R)
β”‚   └── CellChat.R
β”œβ”€β”€ QSAR/                      # 🧠 QSAR modeling & web app (Python)
β”‚   β”œβ”€β”€ figures/
β”‚   β”‚   └── roc_curves_comparison.png
β”‚   β”œβ”€β”€ Data/
β”‚   β”‚   β”œβ”€β”€ chembl_results_P_27338_MAO-B_IC50_classified.csv
β”‚   β”‚   β”œβ”€β”€ chembl_results_P_35354_COX2_IC50_classified.csv
β”‚   β”‚   β”œβ”€β”€ chembl_results_P_43490_VISFATIN_IC50_classified.csv
β”‚   β”‚   β”œβ”€β”€ chembl_results_P_56817_BACE1_IC50_classified.csv
β”‚   β”‚   └── chembl_results_Q_04844_ACHE_IC50_classified.csv
β”‚   β”œβ”€β”€ Model/
β”‚   β”‚   └── final_tuned_model.pkl
β”‚   β”œβ”€β”€ templates/
β”‚   β”‚   └── index.html
β”‚   β”œβ”€β”€ Target_Collection.ipynb
β”‚   β”œβ”€β”€ Ligand_Final.ipynb
β”‚   β”œβ”€β”€ app.py
β”‚   └── requirements.txt
β”œβ”€β”€ Data/                      # A large-scale analysis of over 500,000 cells was performed. A 25,000-cell subset (5,000 from each study) is provided on GitHub for convenience.
β”‚   └── 25K_Sample.rds
└── README.md

border

πŸ“ Detailed Script Information

Script πŸ–₯️ Purpose 🎯 Key Libraries πŸ› οΈ Output πŸ“„
Merge_Data.R Integrates raw scRNA-seq count matrices from multiple GSE studies. Seurat, batchelor, SingleCellExperiment A unified Seurat object containing all datasets.
SingleCell_Main.R Performs QC, normalization, clustering, and cell type annotation. Seurat, harmony, DoubletFinder, SingleR A processed Seurat object with UMAPs and cell annotations.
MiloR_CellAbundance.R Conducts differential abundance testing on cell neighborhoods. miloR, SingleCellExperiment, ggplot2 Differential abundance statistics and visualizations.
CellChat.R Infers and analyzes cell-cell communication pathways. CellChat, Seurat, dplyr Communication network data and plots (bubble plots, heatmaps).
Target_Collection.ipynb Retrieves and preprocesses bioactivity & ADME data from ChEMBL. pandas, chembl_webresource_client, rdkit A cleaned DataFrame and exploratory data visualizations.
Ligand_Final.ipynb Trains, tunes, and evaluates the QSAR machine learning model. scikit-learn, imbalanced-learn, rdkit, pandas A serialized model (.pkl) and performance plots.
app.py Serves a Flask-based web API for on-demand bioactivity predictions. flask, flask-cors, joblib, rdkit JSON responses with predictions and confidence scores.
index.html Provides the interactive front-end UI for the QSAR prediction tool. HTML, CSS, JavaScript An interactive web interface rendered in the browser.

border

πŸ› οΈ Installation

To set up the project environment, please follow these steps.

Step 1: Install R Dependencies πŸ“¦

(Required for SingleCell, MiloR, and CellChat analysis)

# Install core packages from CRAN
install.packages(c("Seurat", "dplyr", "ggplot2", "patchwork", "scater", "scran", "harmony", "batchelor", "SingleR"))

# Install Bioconductor packages
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("SingleCellExperiment", "miloR", "glmGamPoi"))

Step 2: Install Python Dependencies 🐍

(Required for QSAR modeling and the Flask API)

# Clone the repository
git clone [https://github.com/xhammady/AD-scRNA2QSAR.git](https://github.com/xhammady/AD-scRNA2QSAR.git)
cd AD-scRNA2QSAR

# Install Python packages from requirements.txt
pip install -r QSAR/requirements.txt

Note: Key Python libraries include chembl-webresource-client, rdkit, scikit-learn, imbalanced-learn, pandas, flask, and flask-cors.

border

πŸš€ Usage

Follow this sequence to run the full analysis pipeline.

➑️ Step 1: Run the Single-Cell Bioinformatics Pipeline

  1. 🧬 Merge Data: Execute SingleCell/Merge_Data.R to combine the raw count matrices.
  2. πŸ”¬ Preprocess & Cluster: Run SingleCell/SingleCell_Main.R to perform QC, normalization, integration, and annotation.
  3. πŸ” Analyze Differential Abundance: Use MiloR/MiloR_CellAbundance.R to compare cell populations.
  4. πŸ“‘ Infer Communication: Run CellChat/CellChat.R to analyze signaling pathways.

➑️ Step 2: Run the QSAR Cheminformatics Pipeline

  1. πŸ§ͺ Collect Target Data: Open and run the QSAR/Target_Collection.ipynb notebook to query ChEMBL and generate the analysis dataset.
  2. 🧠 Train ML Model: Open and run QSAR/Ligand_Final.ipynb to preprocess features, train the Random Forest model, and save the final .pkl file.

➑️ Step 3: Launch the Predictive API & User Interface

  1. 🌐 Start the Server: From the command line, run the Flask application:
    python QSAR/app.py
    
  2. 🎨 Access the UI: Open your web browser and navigate to http://localhost:5000/ or visit the live application at https://QSARify.com. You can now:
    • Enter a SMILES string for a single compound prediction.
    • Upload a file (.txt, .xls, .xlsx) for batch predictions.
    • View and manage results in the History tab.

border

🀝 Contributing

We welcome contributions to improve this project! Please fork the repository, create a new branch for your feature, and submit a pull request with a detailed description of your changes. Ensure you follow existing coding standards and include tests where applicable.

πŸ“œ License

This project is licensed under the MIT License. See the LICENSE file for more details.

πŸ™ Acknowledgments & Contributors

Project Team

Tools & Data

πŸ“§ Contact

border