How to optimise my code efficiency to speed up large data file extraction programs?

This code can’t be “boosted”, cause anyway you have to iterate over range and get values from dict. Small speed up you can achieve using list comprehension instead, but it’s not significant.

I recommend you to read PEP8 and use recommended coding style. Applying it to your current code you should rename variables:

  • ChromosomeNumber -> chromosome_num.

    Variable names should be lowercase, with words separated by underscores as necessary to improve readability

  • FragmentSize -> fragment_size;

  • Number_of_fragments -> fragments_len (len is shorten variant of length);

  • Dict -> some_dict (you should replace some with proper keyword; also you should avoid naming variables with names of python’s built-in types, functions, modules, etc.);

  • chromosomefragmentlist -> chromosomes.

Other recommendations:

  • range(0, (Number_of_fragments), 1) is equivalent of range(Number_of_fragments) (docs);
  • It’ll be better to use string formatting instead of string concatenation;
  • It’s more common to use some_dict[key] syntax instead of dict.get() method if you don’t need to return default value if key doesn’t exist.

Using all recommendations you’ll get next code:

fmt = "Chromosome{}Fragment{},Basepairs {}-{}"
return [some_dict[fmt.format(chromosome_num, i, i * fragment_size + 1, i * fragment_size + fragment_size)]
        for i in range(fragments_len)]

Leave a Comment