bioinformatics
How to call module written with argparse in iPython notebook
An alternative to use argparse in Ipython notebooks is passing a string to: args = parser.parse_args() (line 303 from the git repo you referenced.) Would be something like: parser = argparse.ArgumentParser( description=’Searching longest common substring. ‘ ‘Uses Ukkonen\’s suffix tree algorithm and generalized suffix tree. ‘ ‘Written by Ilya Stepanov (c) 2013’) parser.add_argument( ‘strings’, metavar=”STRING”, … Read more
Remove line breaks in a FASTA file
This awk program: % awk ‘!/^>/ { printf “%s”, $0; n = “\n” } /^>/ { print n $0; n = “” } END { printf “%s”, n } ‘ input.fasta Will yield: >accession1 ATGGCCCATGGGATCCTAGC >accession2 GATATCCATGAAACGGCTTA Explanation: On lines that don’t start with a >, print the line without a line break and store … Read more
WinError 2 The system cannot find the file specified (Python)
Popen expect a list of strings for non-shell calls and a string for shell calls. Call subprocess.Popen with shell=True: process = subprocess.Popen(command, stdout=tempFile, shell=True) Hopefully this solves your issue. This issue is listed here: https://bugs.python.org/issue17023
Finding overlap in ranges with R
Use the IRanges/GenomicRanges packages from Bioconductor, which is made for dealing with these exact problems (and scales massively) source(“http://bioconductor.org/biocLite.R”) biocLite(“IRanges”) There are a few appropriate containers for ranges on different chromosomes, one is RangesList library(IRanges) rangesA <- split(IRanges(rangesA$start, rangesA$stop), rangesA$chrom) rangesB <- split(IRanges(rangesB$start, rangesB$stop), rangesB$chrom) #which rangesB wholly contain at least one rangesA? ov <- … Read more
Why is Collections.counter so slow?
It’s not because collections.Counter is slow, it’s actually quite fast, but it’s a general purpose tool, counting characters is just one of many applications. On the other hand str.count just counts characters in strings and it’s heavily optimized for its one and only task. That means that str.count can work on the underlying C-char array … Read more
Find the intersection of overlapping ranges in two tables using data.table function foverlaps
@Seth provided the fastest way to solve the problem of intersection overlaps using the data.table foverlaps function. However, this solution did not take into account the fact that the input bed files may have overlapping ranges that needed to be reduced into single regions. @Martin Morgan solved that with his solution using the GenomicRanges package, … Read more
Dictionary style replace multiple items
If you’re open to using packages, plyr is a very popular one and has this handy mapvalues() function that will do just what you’re looking for: foo <- mapvalues(foo, from=c(“AA”, “AC”, “AG”), to=c(“0101”, “0102”, “0103”)) Note that it works for data types of all kinds, not just strings.