UNIX for Bioinformatics
UNIX for Bioinformatics
instructor: Nicholas Navin, Ph.D.
BED Files
BED files are used to store annotations or data values in a simple text format.
BED files require at least three columns, with an optional column as a descriptor or gene name.
column1: chromosome number
column2: start position
column3: stop position
column4: gene name or identifier
Note: columns in a BED file are always tab-delimited
Let’s examine the cancer_genes.bed file with head
head cancer_genes.bed
chr1 2975604 3345045 PRDM16
chr1 17217812 17253252 SDHB
chr1 18830087 18947946 PAX7
Often it will be necessary to extract a subset of columns from a BED file to produce another file
Use the cut command to extract the gene identifier column(4) and make a new file with the gene names
cut -f 4 cancer_genes.bed > genes.txt
Examine the first lines of the genes.txt, to confirm that the 4th column was extracted
more genes.txt
The genes are out of order, let’s sort them alphabetically using the sort command
sort genes.txt
Let’s sort them in reverse alphabetical order
sort -r genes.txt
Now let’s find all the genes in that contain the string ‘RAS ‘
grep RAS genes.txt
How many genes contain the word RAS ?
Now let’s return to the cancer_genes.bed file and use the head and cut commands (no it’s not a guillotine) to extract the first 10 lines from column 1 and output a file called column1.txt
grep RAS genes.txt | wc -l
head -10 cancer_genes.bed | cut -f 1 > column1.txt
Now let’s use head and cut to extract the first 10 lines from column 3 and output a file called column3.txt
head -10 cancer_genes.bed | cut -f 3 > column3.txt
paste column1.txt column3.txt > join_columns.txt
Use the more command to examine the contents of the join_columns.txt file
more join_columns.txt
The counterpart of the cut command is the paste command which can be used to paste columns together
We will use the paste command to stitch together columns 1 and 3 and make a new file called join_columns.txt
Alternatively we can merge the columns vertically using the cat command
cat column1.txt column3.txt > vertical_columns.txt
more vertical_columns.txt
That concludes our section on working with BED files