Code should be readable, broken down into small contained components (modular), and reusable (so you’re not rewriting code to do the same tasks over and over again).
Testing Code Strategy:
- How many times is this code called by other code?
- If this code were wrong, how detrimental to the final results would it be?
- How noticeable would an error be if one occurred?
It’s important to never assume a dataset is high quality. Rather, data’s quality should be proved through exploratory data analysis (known as EDA). EDA is not complex or time consuming, and will make your research much more robust to lurking surprises in large datasets.
Make Figures and Statistics the Results of Scripts
It’s important to always use relative paths (e.g., ../ data/stats/qual.txt) rather than absolute paths (e.g., /home/vinceb/projects/ zmays-snps/data/stats/qual.txt).
Document/Readme in project’s main directories
- methods and workflows (command-line)
- origin of all data
- when you downloaded data/ data version/ how you downloaded the data
- software version
leverage directories to help stay organized.
Shell Expansion Tips
$ echo dog-{gone,bowl,bark}
$ mkdir -p zmays-snps/{data/seqs,scripts,analysis}
$ touch seqs/zmays{A,B,C}_R{1,2}.fastq
|
|
shell wildcards
Wildcard What it matches
*: Zero or more characters (but ignores hidden les starting with a period).
?: One character (also ignores hidden les).
[A-Z]: Any character between the supplied alphanumeric range (in this case, any character betweenAandZ); this works for any alphanumeric character range (e.g.,[0-9]matches any character between 0 and 9).
best to be as restrictive as possible with wildcards
Instead of zmaysB, use `**zmaysB\fastqor
zmaysB_R?.fastq**` (the ? only matches a single character).
$ ls zmays[AB]_R1.fastq
zmaysA_R1.fastq zmaysB_R1.fastq
$ ls zmays[A-B]_R1.fastq
zmaysA_R1.fastq zmaysB_R1.fastq
Leading Zeros and Sorting
e.g., le-0021.txt rather than le-21.txt
$ ls -l
-rw-r–r– 1 vinceb staff 0 Feb 21 21:23 genes-001.txt
-rw-r–r– 1 vinceb staff 0 Feb 21 21:23 genes-002.txt
[…]
-rw-r–r– 1 vinceb staff 0 Feb 21 21:23 genes-013.txt
-rw-r–r– 1 vinceb staff 0 Feb 21 21:23 genes-014.txt
use markdown to
Using pipelines
tee
|
|
Here, program1’s standard output is both written to intermediate- le.txt and piped directly into program2’s standard input.
Tmux
new session with a name
Key sequence | Action |
---|---|
Control-a d | Detach |
Control-a c | Create new window |
tmux ls | list all sessions |
tmux new -s new | creat a session named “new” |
tmux att -t new | attach a session named “new” |
tmux att -d -t new | attach a session named “test”, detaching it first |
change defalt key with .tmux.conf
|
|