How can I read large text files line by line, without loading it into memory?
I provided this answer because Keith’s, while succinct, doesn’t close the file explicitly with open(“log.txt”) as infile: for line in infile: do_something_with(line)
I provided this answer because Keith’s, while succinct, doesn’t close the file explicitly with open(“log.txt”) as infile: for line in infile: do_something_with(line)
In Python 2.x: range creates a list, so if you do range(1, 10000000) it creates a list in memory with 9999999 elements. xrange is a sequence object that evaluates lazily. In Python 3: range does the equivalent of Python 2’s xrange. To get the list, you have to explicitly use list(range(…)). xrange no longer exists.
As per the documentation: This allows you to switch from the default ASCII to other encodings such as UTF-8, which the Python runtime will use whenever it has to decode a string buffer to unicode. This function is only available at Python start-up time, when Python scans the environment. It has to be called in … Read more
Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using. If you are piping you must encode it yourself. A rule of thumb is: Always use Unicode internally. Decode what you receive, and encode what you send. # -*- coding: utf-8 -*- print u”åäö”.encode(‘utf-8’) … Read more
The problem here is that the + operator has (at least) two different meanings in Python: for numeric types, it means “add the numbers together”: >>> 1 + 2 3 >>> 3.4 + 5.6 9.0 … and for sequence types, it means “concatenate the sequences”: >>> [1, 2, 3] + [4, 5, 6] [1, 2, … Read more
You’re using Python 2.x, where integer divisions will truncate instead of becoming a floating point number. >>> 1 / 2 0 You should make one of them a float: >>> float(10 – 20) / (100 – 10) -0.1111111111111111 or from __future__ import division, which the forces / to adopt Python 3.x’s behavior that always returns … Read more
Unidecode is the correct answer for this. It transliterates any unicode string into the closest possible representation in ascii text. Example: accented_string = u’Málaga’ # accented_string is of type ‘unicode’ import unidecode unaccented_string = unidecode.unidecode(accented_string) # unaccented_string contains ‘Malaga’and is of type ‘str’
From the python 2 manual: CPython implementation detail: Objects of different types except numbers are ordered by their type names; objects of the same types that don’t support proper comparison are ordered by their address. When you order two strings or two numeric types the ordering is done in the expected way (lexicographic ordering for … Read more
In Python 2, division of two ints produces an int. In Python 3, it produces a float. We can get the new behaviour by importing from __future__. >>> from __future__ import division >>> a = 4 >>> b = 6 >>> c = a / b >>> c 0.66666666666666663
UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xa0′ in position 20: ordinal not in range(128)