admixtools FAQs

发表于 2017-04-18 | 分类于 Cheetsheets

常见的和格式有关的问题：more

ped、fam格式第六列的格式问题

Question: convertf decides to “ignore” all my samples. Why?

Answer: A likely reason is that you are using a “fam” or “ped” file with a funny value (0, 9 or -9) in column 6. Try setting column 6 to 1.

推荐去除LD先

Question: Should regions of long-range LD in the genome be removed prior to PCA?

Answer: Yes, to avoid principal components that are artifacts of long-range LD it is ideal to remove such regions. See Table 1 of Price et al. 2008 AJHG. However, EIGENSTRAT can subsequently be run to compute disease association statistics using the full set of SNPs.

仅支持有限的SNP数量

Question: Can I run EIGENSOFT on very large data sets?

Answer: Yes. We currently support GWAS data sets up to 8 billion genotypes. For data sets between 2 billion and 8 billion genotypes, some care is required. See documentation for details.

命名太长的问题

Question: When I run I get an error message about “idnames too long”. What should I do?

Answer: The software supports sample ID names up to a max of 39 characters. Longer sample ID names must be shortened. In addition, if your data is in PED format, the default is to concatenate the family ID and sample ID names so that their total length must meet this limit; however, you can set “familynames: NO” so that only the sample ID name will be used and must meet the 39 character limit.

只取一部分群体

Question: How do I compute principal components using only a subset of populations and project other populations onto those principal components?

Answer: Use -w flag to smartpca.perl (see EIGENSTRAT/README), or use poplistname parameter to smartpca (see POPGEN/README).