Thesis:GWASLab preprint: He, Y., Koido, M., Shimmori, Y., Kamatani, Y. (2023). GWASLab: a Python package for processing and visualizing GWAS summary statistics. Preprint at Jxiv, 2023-5. https://doi.org/10.51094/jxiv.370
1. GWAS可视化学习 原文链接:gwaslab/examples/7_visualization/visualization_regional.ipynb at main · Cloufield/gwaslab
# 创建新环境 # conda env create -n gwaslab -c conda-forge python=3.9 # pip install gwaslab # pip install wget
import wgetimport inspectimport sysimport gwaslab as glwget.download("http://jenger.riken.jp/14/" , out = "/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/t2d_bbj.txt.gz" ) mysumstats = gl.Sumstats("/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/t2d_bbj.txt.gz" , snpid = "SNP" , chrom = "CHR" , pos = "POS" , ea = "ALT" , nea = "REF" , neaf = "Frq" , beta = "BETA" , se = "SE" , p = "P" , direction = "Dir" , build = "19" , chrom_pat = "7" , n = "N" ) mysumstats.basic_check(verbose = False )
1.0 可用文档
1.1 曼哈顿图 mqq_fig, mqq_ax = mysumstats.plot_mqq( mode="m" , anno="GENENAME" , anno_source="ensembl" ) mqq_fig.savefig("/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/manhattan_plot.png" )
1.2 区域图 mysumstats.get_lead() region_fig, region_ax = mysumstats.plot_mqq(region=(7 ,156538803 ,157538803 )) region_fig.savefig("/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/region_base_plot.png" )
左图是简单的区域图,右图是Q-Q图,用于评估p值分布的合理性。图中$\lambda_{GC}$(基因组膨胀因子 )用于衡量统计量的潜在膨胀或收缩。此处Q-Q图绘制的目标是观察到的p值分布与理论期望分布(假设无真实关联时的随机分布)。
X轴:理论期望的p值,即均匀分布下的p值;
Y轴:实际分布得到的p值;
理想情况:所有点沿着对角线分布;
偏离情况:1、整体高于对角线:此时存在真实关联信号或系统性误差(群体分层、技术偏差);2、整体低于对角线:可能存在过校正或统计效力不足的问题。
$\lambda_{GC}=\frac{\text{观察到的卡方统计量中位数}}{\text{理论卡方分布中位数(}0.456\mathrm{)}}$,当$\lambda_{GC}>1$时存在系统性膨胀;$\lambda_{GC}<1$时表示统计效力不足;$\lambda_{GC} \approx 1$时表示符合期望。
整体观察左右两图:
若区域图中有显著峰,并且Q-Q图中存在点明显偏离对角线,说明信息真实;
若Q-Q图整体膨胀,区域图中的峰可能包含假阳性。
region_withTRACE_fig, region_withTRACE_ax = mysumstats.plot_mqq(mode="r" ,region=(7 ,156538803 ,157538803 ),region_grid=True , rr_path="/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/recombination/hg19/genetic_map_GRCh37_chr7.txt" , rr_header_dict=None , rr_chr_dict = None , rr_lim=(0 ,100 ), rr_ylabel=True ) region_withTRACE_fig.savefig("/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/region_withTRACE_plot.png" )
local_vcf_path = "/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/1k_eas_hg19/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz" region_withTRACE_LD_fig, region_withTRACE_LD_ax = mysumstats.plot_mqq( mode= "r" , region= ( 7 , 156538803 , 157538803 ) , region_grid= True, rr_path= "/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/recombination/hg19/genetic_map_GRCh37_chr7.txt" , vcf_path = local_vcf_path) region_withTRACE_LD_fig.savefig( "/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/region_withTRACE_LD_plot.png" )
local_vcf_path_eas = "/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/1k_eas_hg19/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz" local_vcf_path_eur = "/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/1k_eur_hg19/EUR.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz" stacked_region_fig, stacked_region_ax = gl.plot_stacked_mqq( objects= [ mysumstats, mysumstats] , vcfs= [ local_vcf_path_eas, local_vcf_path_eur] , region= ( 7 , 156538803 , 157538803 ) , mode= "r" , build= "19" , rr_path= "/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/recombination/hg19/genetic_map_GRCh37_chr7.txt" , anno= True, anno_style= "right" , anno_xshift= 0.02 , anno_set= [ "7:156793450_G_GA" , "7:157038803_A_G" ] , titles= [ "EAS" , "EUR" ] , title_args= { "size" : 20 } , anno_args= { "rotation" : 0 } ) stacked_region_fig.savefig( "/data2/projects/bioinfo/tyliu/1_proteomics/MR/ckb_mr/coloc/plot/exam/stacked_plot.png" )
注:GWAS数据本身是EAS的数据,所以这里只是函数功能的展示,并不可以使用EUR的Reference进行LD信息的参考。
1.3 一些细节化参数 原网址:Manhattan and QQ plot - GWASLab
region_ref="your_snpid" fontsize = 16 title_fontsize = 18 fig_args={"figsize" :(10 ,10 )
若遇到P极小的情况:
GWASLab包提供P值自动填补的功能:
mysumstats = gl.Sumstats("./500_cis_pqtl.csv" , sep = "," , snpid = "id" , chrom = "CHROM" , pos = "GENPOS" , ea = "ALLELE1" , nea = "ALLELE0" , neaf = "A1FREQ" , beta = "BETA" , se = "SE" , p = "nom_p" , build = "38" , chrom_pat = "1" ) mysumstats.basic_check(verbose = False ) mysumstats.fill_data(to_fill=["MLOG10P" ], extreme=True ) scaled=True