bioinformatics - w3toppers.com

An alternative to use argparse in Ipython notebooks is passing a string to: args = parser.parse_args() (line 303 from the git repo you referenced.) Would be something like: parser = argparse.ArgumentParser( description=’Searching longest common substring. ‘ ‘Uses Ukkonen\’s suffix tree algorithm and generalized suffix tree. ‘ ‘Written by Ilya Stepanov (c) 2013’) parser.add_argument( ‘strings’, metavar=”STRING”, … Read more

Collapse intersecting regions

Remove line breaks in a FASTA file

This awk program: % awk ‘!/^>/ { printf “%s”, $0; n = “\n” } /^>/ { print n $0; n = “” } END { printf “%s”, n } ‘ input.fasta Will yield: >accession1 ATGGCCCATGGGATCCTAGC >accession2 GATATCCATGAAACGGCTTA Explanation: On lines that don’t start with a >, print the line without a line break and store … Read more

WinError 2 The system cannot find the file specified (Python)

Popen expect a list of strings for non-shell calls and a string for shell calls. Call subprocess.Popen with shell=True: process = subprocess.Popen(command, stdout=tempFile, shell=True) Hopefully this solves your issue. This issue is listed here: https://bugs.python.org/issue17023

Finding overlap in ranges with R

Use the IRanges/GenomicRanges packages from Bioconductor, which is made for dealing with these exact problems (and scales massively) source(“http://bioconductor.org/biocLite.R”) biocLite(“IRanges”) There are a few appropriate containers for ranges on different chromosomes, one is RangesList library(IRanges) rangesA <- split(IRanges(rangesA$start, rangesA$stop), rangesA$chrom) rangesB <- split(IRanges(rangesB$start, rangesB$stop), rangesB$chrom) #which rangesB wholly contain at least one rangesA? ov <- … Read more

Why is Collections.counter so slow?

It’s not because collections.Counter is slow, it’s actually quite fast, but it’s a general purpose tool, counting characters is just one of many applications. On the other hand str.count just counts characters in strings and it’s heavily optimized for its one and only task. That means that str.count can work on the underlying C-char array … Read more

Find the intersection of overlapping ranges in two tables using data.table function foverlaps

@Seth provided the fastest way to solve the problem of intersection overlaps using the data.table foverlaps function. However, this solution did not take into account the fact that the input bed files may have overlapping ranges that needed to be reduced into single regions. @Martin Morgan solved that with his solution using the GenomicRanges package, … Read more

Dictionary style replace multiple items

If you’re open to using packages, plyr is a very popular one and has this handy mapvalues() function that will do just what you’re looking for: foo <- mapvalues(foo, from=c(“AA”, “AC”, “AG”), to=c(“0101”, “0102”, “0103”)) Note that it works for data types of all kinds, not just strings.

How to remove rows with 0 values using R

Read FASTA into a dataframe and extract subsequences of FASTA file

How to call module written with argparse in iPython notebook