Large, persistent DataFrame in pandas

Wes is of course right! I’m just chiming in to provide a little more complete example code. I had the same issue with a 129 Mb file, which was solved by:

import pandas as pd

tp = pd.read_csv('large_dataset.csv', iterator=True, chunksize=1000)  # gives TextFileReader, which is iterable with chunks of 1000 rows.
df = pd.concat(tp, ignore_index=True)  # df is DataFrame. If errors, do `list(tp)` instead of `tp`

More Related Contents:

How to plot in multiple subplots
Pandas GroupBy.apply method duplicates first group
Best way to join / merge by range in pandas
Pandas: peculiar performance drop for inplace rename after dropna
Could pandas use column as index?
How can I make a barplot and a lineplot in the same seaborn plot with different Y axes nicely?
Why does pandas apply calculate twice
NLTK-based text processing with pandas
Set values on the diagonal of pandas.DataFrame
Pandas: Bar-Plot with two bars and two y-axis
Extracting specific selected columns to new DataFrame as a copy
Get pandas.read_csv to read empty values as empty string instead of nan
Pandas join/merge/concat two dataframes
Assign pandas dataframe column dtypes
pandas replace zeros with previous non zero value
Python – Delete duplicates in a dataframe based on two columns combinations?
Pandas error when using if-else to create new column: The truth value of a Series is ambiguous
How do I make the width of the title box span the entire plot?
Renaming columns in a Pandas dataframe with duplicate column names?
How to edit properties of whiskers, fliers, caps, etc. in Seaborn boxplot
How to determine whether a column/variable is numeric or not in Pandas/NumPy?
Python Pandas: Convert “.value_counts” output to dataframe
Given a pandas Series that represents frequencies of a value, how can I turn those frequencies into percentages?
LabelEncoder: TypeError: ‘>’ not supported between instances of ‘float’ and ‘str’
What are the pros and cons between get_dummies (Pandas) and OneHotEncoder (Scikit-learn)?
Run an OLS regression with Pandas Data Frame
Relative Strength Index in python pandas
Pandas: remove duplicates that exist in any order
Difference between df[x], df[[x]], df[‘x’] , df[[‘x’]] and df.x
Numpy.dtype has the wrong size, try recompiling

More Related Contents:

Leave a Comment Cancel reply