Read a small random sample from a big CSV file into a Python data frame

Assuming no header in the CSV file: import pandas import random n = 1000000 #number of records in file s = 10000 #desired sample size filename = “data.txt” skip = sorted(random.sample(range(n),n-s)) df = pandas.read_csv(filename, skiprows=skip) would be better if read_csv had a keeprows, or if skiprows took a callback func instead of a list. With … Read more