Map with BWA and do Indel realignment with GATK

01.Mapping_BWAMEM-GATKRealgn.pl

get script

1
git clone https://github.com/wlz0726/SequenceMapping.git

step1: Mapping with BWA MEM and sort with samtools, connected with tunnel

NB: samtools should be v1.* or later

1
2
bwa mem -R ref.fa SampleID.1.fq.gz SampleID.2.fq.gz | samtools sort -O bam -o SampleID.sort.bam
samtools index SampleID.sort.bam

step2: remove PCR duplicates with picard

1
2
3
4
5
6
7
java -jar picard.jar MarkDuplicates \
INPUT=SampleID.sort.bam \
OUTPUT=SampleID.rmdup.bam \
METRICS_FILE =SampleID.dup.txt \
REMOVE_DUPLICATES=true \
VALIDATION_STRINGENCY=SILENT \
CREATE_INDEX=true

step3: do local realignment of reads to enhance the alignments in the vicinity of indel polymorphisms

NB: GATK v3.6
NB: Note that indel realignment is no longer necessary for variant discovery if you plan to use a variant caller that performs a haplotype assembly step, such as HaplotypeCaller or MuTect2. However it is still required when using legacy callers such as UnifiedGenotyper or the original MuTect.

1
2
3
4
5
6
7
8
9
10
11
12
# use the RealignerTargetCreator to identify regions where realignment was needed
java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator \
-R ref.fa \
-o SampleID.intervals \
-I SampleID.rmdup.bam
# use IndelRealigner to realign the regions found in the RealignerTargetCreator step
java -jar GenomeAnalysisTK.jar -T IndelRealigner \
-R ref.fa \
-targetIntervals SampleID.intervals \
-I SampleID.rmdup.bam \
-o SampleID.realn.bam

Before mapping: Filter and Trim your Paired-end FastQ file

After mapping: do SNP calling with GATK