pandas - w3toppers.com

Fastest way to parse large CSV files in Pandas

As @chrisb said, pandas’ read_csv is probably faster than csv.reader/numpy.genfromtxt/loadtxt. I don’t think you will find something better to parse the csv (as a note, read_csv is not a ‘pure python’ solution, as the CSV parser is implemented in C). But, if you have to load/query the data often, a solution would be to parse … Read more

pandas – convert string into list of strings [duplicate]

You can split the string manually: >>> df[‘Tags’] = df.Tags.apply(lambda x: x[1:-1].split(‘,’)) >>> df.Tags[0] [‘Tag1’, ‘Tag2’]

Confusion about pandas copy of slice of dataframe warning

izmir = pd.read_excel(filepath) izmir_lim = izmir[[‘Gender’,’Age’,’MC_OLD_M>=60′,’MC_OLD_F>=60′, ‘MC_OLD_M>18′,’MC_OLD_F>18′,’MC_OLD_18>M>5’, ‘MC_OLD_18>F>5′,’MC_OLD_M_Child<5′,’MC_OLD_F_Child<5’, ‘MC_OLD_M>0<=1′,’MC_OLD_F>0<=1′,’Date to Delivery’, ‘Date to insert’,’Date of Entery’]] izmir_lim is a view/copy of izmir. You subsequently attempt to assign to it. This is what is throwing the error. Use this instead: izmir_lim = izmir[[‘Gender’,’Age’,’MC_OLD_M>=60′,’MC_OLD_F>=60′, ‘MC_OLD_M>18′,’MC_OLD_F>18′,’MC_OLD_18>M>5’, ‘MC_OLD_18>F>5′,’MC_OLD_M_Child<5′,’MC_OLD_F_Child<5’, ‘MC_OLD_M>0<=1′,’MC_OLD_F>0<=1′,’Date to Delivery’, ‘Date to insert’,’Date of Entery’]].copy() Whenever you ‘create’ … Read more

pandas scatter plotting datetime

Not a real answer but a workaround, as suggested by Tom Augspurger, is that you can just use the working line plot type and specify dots instead of lines: df.plot(x=’x’, y=’y’, style=”.”)

How to format IPython html display of Pandas dataframe?

HTML receives a custom string of html data. Nobody forbids you to pass in a style tag with the custom CSS style for the .dataframe class (which the to_html method adds to the table). So the simplest solution would be to just add a style and concatenate it with the output of the df.to_html: style=”<style>.dataframe … Read more

pandas combine two strings ignore nan values

Call fillna and pass an empty str as the fill value and then sum with param axis=1: In [3]: df = pd.DataFrame({‘a’:[‘asd’,np.NaN,’asdsa’], ‘b’:[‘asdas’,’asdas’,np.NaN]}) df Out[3]: a b 0 asd asdas 1 NaN asdas 2 asdsa NaN In [7]: df[‘a+b’] = df.fillna(”).sum(axis=1) df Out[7]: a b a+b 0 asd asdas asdasdas 1 NaN asdas asdas 2 … Read more

“Too many indexers” with DataFrame.loc

The reason this doesn’t work is tied to the need to specify the axis of indexing (mentioned in http://pandas.pydata.org/pandas-docs/stable/advanced.html). An alternative solution to your problem is to simply do this: df.loc(axis=0)[:, :, ‘C1′, :] Pandas gets confused sometimes when indexes are similar or contain similar values. If you were to have a column named ‘C1’ … Read more

What is the Spark DataFrame method `toPandas` actually doing?

Using spark to read in a CSV file to pandas is quite a roundabout method for achieving the end goal of reading a CSV file into memory. It seems like you might be misunderstanding the use cases of the technologies in play here. Spark is for distributed computing (though it can be used locally). It’s … Read more

Replace whole string if it contains substring in pandas

You can use str.contains to mask the rows that contain ‘ball’ and then overwrite with the new value: In [71]: df.loc[df[‘sport’].str.contains(‘ball’), ‘sport’] = ‘ball sport’ df Out[71]: name sport 0 Bob tennis 1 Jane ball sport 2 Alice ball sport To make it case-insensitive pass `case=False: df.loc[df[‘sport’].str.contains(‘ball’, case=False), ‘sport’] = ‘ball sport’