One can use standalone system for querying large cells and Data Privacy concerns. Cross-batch and Cross-species embedding is available in stand-alone version.
Dependency Installation Readme File
If user wants to run the tool from command line. Please follow the following script :
For finding projection of Query cells, use commandline_search.py script.
Command to run script is :
python3 commandline_search.py
Following arguments user can edit within the script as mentioned below :
############# INPUT THE ARGUMENTS HERE#################
########## Query count file location ####
query_file = './sample_queries/human/GM/query_gm.txt'
########## Query peak file location ####
chr_file = './sample_queries/human/GM/chr_gm.bed'
########## No of top results for each query cell
top_study = 5
########## No of Top clusters to search from
cluster = 5 # No of Clusters
########## Active genes(Top 500) - 1 , Poised genes(Top 2000) - 2
active_poised = 1
########## Query Type : Human - 1 , Mouse - 2 , Cross-Species - 3
query_type = 1 # 1 - Human , 2 - Mouse , 3 - Cross-Species
########## Annotated/Unannotated : Annotated - 1 , Unannotated - 2
anno_unanno = 2
########### Accurate - 1 , Faster - 2
accurate_faster = 2
################################################################
For embedding of Query cells from different batches/species, use commandline_embedding.py script.
Command to run script is :
python3 commandline_embedding.py
Following arguments user can edit within the script as mentioned below :
#################################################################
########## Number of Dataset Queries
data = 2 #No of datasets
########## Number of Top results for each query
rps = 5
########## Active(Top 500 Genes) - 1 , Poised(Top 1000 Genes) - 2
active_poised = 2
########## Top clusters to search from
cls = 5
########## Accurate - 1 , Faster -2
var1 = 2 # accurate_faster
########## Annotated Reference only - 1 , Unannotated Reference - 2
anno_unanno_emb = 2 #annotated_unannotated
########## Query Count files location
query_file = ['./queries_embedding/query_gm_GSE68103_human.csv','./queries_embedding/query_bcells_mouse.txt']
########## Queries Peak files location (In same order as above)
chr_file = ['./queries_embedding/gm_human_GSE68103.bed','./queries_embedding/chr_bcells_mouse.bed']
########## Species for each dataset (Human-1,Mouse-2)
vars = [1,2] #species info
###################################################################
Note : For the purpose of security, standalone can be downloaded with id and password.
Link for Data,code,Results of figures :
Figure-1 :
Cell-type Query (Human)(Source GEO ID) : URL
HL60 (GSE109828): http://reggen.iiitd.edu.in:1207/episearch/?job=-ntlgsuqp4-utvlgmnqx
Myoblast (GSE109828) : http://reggen.iiitd.edu.in:1207/episearch/?job=-ggbqir0do-hlpsn5gn7 , http://reggen.iiitd.edu.in:1207/episearch/?job=-ag5z14ze5-v2xf77zlf
GM12878 (GSE109828) : http://reggen.iiitd.edu.in:1207/episearch/?job=-vgptvbeci-dlcdcztlk
H1ESC (GSE65360) : http://reggen.iiitd.edu.in:1207/episearch/?job=-206f3vcny-33j40evt3
Neuron (GSE97942) : http://reggen.iiitd.edu.in:1207/episearch/?job=-r9sdpeg5i-pgjxb2nw8
Cell-type Query (Mouse)(Source GEO ID) : URL
Neuron (GSE111586) : http://reggen.iiitd.edu.in:1207/episearch/?job=-puamhnftl-zkm98ya79
Endothelial (GSE111586) : http://reggen.iiitd.edu.in:1207/episearch/?job=-6n9foszqj-jgs0mmstl
Dendritic (GSE111586) : http://reggen.iiitd.edu.in:1207/episearch/?job=-pp08s9ztk-cp31ehp5c
Macrophage (GSE111586) : http://reggen.iiitd.edu.in:1207/episearch/?job=-leh9jpjui-f7uzo109j
NK cell (GSE111586) : http://reggen.iiitd.edu.in:1207/episearch/?job=-dppduwmaj-9d80zgmvf
T cell (GSE111586) : http://reggen.iiitd.edu.in:1207/episearch/?job=-l2wb6vhcw-pc84v1q1i
HSC-hematopoetic stem cells (GSE111586) : http://reggen.iiitd.edu.in:1207/episearch/?job=-ks5c8qnuc-0q64t7rf1
Cell-type Query (Cross-Species)(Source GEO ID) : URL
Myoblast (GSE109828) : http://reggen.iiitd.edu.in:1207/episearch/?job=-km4m8sm1n-by3oshify
GM12878 (GSE109828) : http://reggen.iiitd.edu.in:1207/episearch/?job=-uhibbslst-d5c8lzijd
Neuron (GSE97942) : http://reggen.iiitd.edu.in:1207/episearch/?job=-uuo33h9um-g5bgybhol
HL60 (GSE109828) : http://reggen.iiitd.edu.in:1207/episearch/?job=-ljwz1w0ml-whrhfh401
H1ESC (GSE65360) : http://reggen.iiitd.edu.in:1207/episearch/?job=-p880gctre-yhrfim47c
Dataset Location : http://reggen.iiitd.edu.in:1207/episearch/index.php?view=sample_queries
Figure-S5,S6,S7 : Code : https://github.com/reggenlab/scEpiSearch/tree/main/standalone_Code
Dataset : http://reggen.iiitd.edu.in:1207/scepisearch_supplementary_files/scepisearch_supplementary_files/S5,S6,S7-data/
Result Tables : http://reggen.iiitd.edu.in:1207/scepisearch_supplementary_files/scepisearch_supplementary_files/
Figure-2(C,D) Data & Code : https://github.com/reggenlab/scEpiSearch/tree/main/sRNA-seq-ATAC-seq-integration ,
Figure-3 Data : http://reggen.iiitd.edu.in:1207/scepisearch_supplementary_files/scepisearch_supplementary_files/Figure-2-data-code/
code : https://github.com/reggenlab/scEpiSearch/tree/main/standalone_Code
Figure-4(a) Data : https://github.com/reggenlab/scEpiSearch/tree/main/co-embedding-scATAC-seq-queries/EMBEDDING_1_CODE
code : https://github.com/reggenlab/scEpiSearch/tree/main/co-embedding-scATAC-seq-queries/EMBEDDING_1_CODE
Figure-4(b) Data : https://github.com/reggenlab/scEpiSearch/tree/main/co-embedding-scATAC-seq-queries/EMBEDDING_2_CODE
code : https://github.com/reggenlab/scEpiSearch/tree/main/co-embedding-scATAC-seq-queries/EMBEDDING_2_CODE
Figure-4(c) Data : https://github.com/reggenlab/scEpiSearch/tree/main/co-embedding-scATAC-seq-queries/EMBEDDING_3_CODE
code : https://github.com/reggenlab/scEpiSearch/tree/main/co-embedding-scATAC-seq-queries/EMBEDDING_3_CODE
Figure-S9 Data : https://github.com/reggenlab/scEpiSearch/tree/main/co-embedding-scATAC-seq-queries/EMBEDDING_4_CODE
code : https://github.com/reggenlab/scEpiSearch/tree/main/co-embedding-scATAC-seq-queries/EMBEDDING_4_CODE
Figure-5 Data : http://reggen.iiitd.edu.in:1207/scepisearch_supplementary_files/scepisearch_supplementary_files/Figure-4-data-code/
Code : http://reggen.iiitd.edu.in:1207/scepisearch_supplementary_files/scepisearch_supplementary_files/Figure-4-data-code/TSNE_cancer_immune_embedding_final.ipynb
Figure-6 Data : http://reggen.iiitd.edu.in:1207/scepisearch_supplementary_files/scepisearch_supplementary_files/Figure-5-data-code/
Code : https://github.com/reggenlab/scEpiSearch/tree/main/standalone_Code , https://github.com/reggenlab/scEpiSearch/tree/main/mESC-scATAC-seq-analysis
User need to have python >=3.5+ install in their system. User can install all dependencies for the application using "requirements.txt" file present in the folder and can be downloaded separately from link above. One can use following command for same :
User also need to have R installed in their system beforehand. Following packages of R are required :
Rpy2 library used in this project creates a bridge between R and python.
In order to install tkinter which provides GUI for the app, user needs to install it using following command :
Finally user can run the application by reaching into the searchProject folder and running following command :
If all dependencies work well, user can see the GUI.
Note : If user runs into the error "OSError: cannot load library '/home/cell/R/lib/R/lib/libR.so': libBblas.so: shared Cannot open object file: there is no such file or directory "
If user is executing from location machine, GUI will work fine. Otherwise if user is executing from remote server, X11 forwarding need to be enabled. Deatils of this is available in downloadable readme file above. file