Read .csv file from URL into Python 3.x - _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

The problem relies on urllib returning bytes. As a proof, you can try to download the csv file with your browser and opening it as a regular file and the problem is gone.

A similar problem was addressed here.

It can be solved decoding bytes to strings with the appropriate encoding. For example:

import csv
import urllib.request

url = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/file_list.csv"
ftpstream = urllib.request.urlopen(url)
csvfile = csv.reader(ftpstream.read().decode('utf-8'))  # with the appropriate encoding 
data = [row for row in csvfile]

The last line could also be: data = list(csvfile) which can be easier to read.

By the way, since the csv file is very big, it can slow and memory-consuming. Maybe it would be preferable to use a generator.

EDIT:
Using codecs as proposed by Steven Rumbalski so it’s not necessary to read the whole file to decode. Memory consumption reduced and speed increased.

import csv
import urllib.request
import codecs

url = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/file_list.csv"
ftpstream = urllib.request.urlopen(url)
csvfile = csv.reader(codecs.iterdecode(ftpstream, 'utf-8'))
for line in csvfile:
    print(line)  # do something with line

Note that the list is not created either for the same reason.

Read .csv file from URL into Python 3.x – _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

Leave a Comment Cancel reply

More Related Contents:

Leave a Comment Cancel reply