Python snippets
Below lab members may add some quick copy and paste code blocks which may be helpful in a variety of situations related to work in the lab
Read and write to a TSV file
from csv import DictReader
with open(input_path, 'r') as fr:
reader = DictReader(fr, delimiter='\t')
headers = reader.fieldnames
rows = [x for x in reader]
from csv import DictWriter
with open(output_path, 'w') as fw:
writer = DictWriter(fw, fieldnames=headers, delimiter='\t')
writer.writeheader()
for row in rows:
writer.writerow(row)
Subset a gff3 using majiq_tools
import majiq_tools as mt
import pandas as pd
import sys
gff3 = 'Homo_sapiens.GRCh38.94.gff3'
save_as = 'Homo_sapiens.GRCh38.94.gff3.sub'
genes = ['ENSG00000215704', 'ENSG00000142615', 'ENSG00000259042', 'ENSG00000251002', 'ENSG00000275552']
def subset_gff3(gff3, genes, save_as):
fullgff3 = gff3
gff3 = mt.gff3.load_gff3(fullgff3)
genes = ['gene:' + x for x in genes]
gff3_subsetted = mt.gff3.subset_genes(df_gff3=gff3,gene_id=genes)
mt.gff3.save_gff3(gff3_subsetted, save_as)
subset_gff3(gff3, genes, save_as)
Allow your program to take arguments
As your script grows past the most basic form, you will want to avoid needing to edit the script itself to modify options, such as input/output paths and filter thresholds. Here is a basic overview of one of the most intuitive libraries, argparse:
import argparse
parser = argparse.ArgumentParser(description='This is just a generally great software, you know~!')
parser.add_argument('-i', "--input-file", help="input path", required=True)
parser.add_argument("--output-file", help="output path", required=False)
parser.add_argument("--some-float-filter", help="some number thingy", type=float, default=0.05)
parser.add_argument("--number-of-things", type=int,
help="this can be a really long, super comprehensive description too, the program will automatically"
" make it look nice in the program help screen. Give it a try!"
)
parser.add_argument("--program-is-awesome", action='store_true', help="If the program is awesome, set this flag")
parser.add_argument("--analysis-type", help="psi, dpsi, or het?", choices=['psi', 'dpsi', 'het'])
parser.add_argument("--secret-thing", help=argparse.SUPPRESS)
args = parser.parse_args()
print(args)
print(args.input_file)
Here we have a basic demonstration of the library (there are many more advanced usages too, if you need something it probably exists)
The immediate points to note about the add_argument function:
-specify one or two strings, if two one will be the 'short' version (in the example thats '-i') and the other the actual name of the argument.
-the name that you give for the argument should use dashes to separate words, however, the variable name after parsed will replace all of the dashes with underscores. This is the convention (dashes in shell, underscores in python)
-there can be a "type" argument, which will require that the specified setting be a float or int. For a string argument, you simply don't specify any type.
-for a "switch" (boolean) type, instead of using type, you give action='store_true' as indicated.
-you can limit possible options for a string argument with the "choices" parameter
-for an argument that you can use while testing but you don't want displayed to users, you use help=argparse.SUPPRESS
A nice help text screen will be automatically generated for your program. To see it just run your script with "--help":
usage: samples.py [-h] -i INPUT_FILE [--output-file OUTPUT_FILE] [--some-float-filter SOME_FLOAT_FILTER] [--number-of-things NUMBER_OF_THINGS] [--program-is-awesome] [--analysis-type {psi,dpsi,het}]
This is just a generally great software, you know~!
optional arguments:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
input path
--output-file OUTPUT_FILE
output path
--some-float-filter SOME_FLOAT_FILTER
some number thingy
--number-of-things NUMBER_OF_THINGS
this can be a really long, super comprehensive description too, the program will automatically make it look nice in the program help screen. Give it a try!
--program-is-awesome If the program is awesome, set this flag
--analysis-type {psi,dpsi,het}
psi, dpsi, or het?