dataframe
Pythonic/efficient way to strip whitespace from every Pandas Data frame cell that has a stringlike object in it
Stumbled onto this question while looking for a quick and minimalistic snippet I could use. Had to assemble one myself from posts above. Maybe someone will find it useful: data_frame_trimmed = data_frame.apply(lambda x: x.str.strip() if x.dtype == “object” else x)
python pandas- apply function with two arguments to columns
Why not just do this? df[‘NewCol’] = df.apply(lambda x: segmentMatch(x[‘TimeCol’], x[‘ResponseCol’]), axis=1) Rather than trying to pass the column as an argument as in your example, we now simply pass the appropriate entries in each row as argument, and store the result in ‘NewCol’.
Get weekday/day-of-week for Datetime column of DataFrame
Use the new dt.dayofweek property: In [2]: df[‘weekday’] = df[‘Timestamp’].dt.dayofweek df Out[2]: Timestamp Value weekday 0 2012-06-01 00:00:00 100 4 1 2012-06-01 00:15:00 150 4 2 2012-06-01 00:30:00 120 4 3 2012-06-01 01:00:00 220 4 4 2012-06-01 01:15:00 80 4 In the situation where the Timestamp is your index you need to reset the index … Read more
How to get number of groups in a groupby object in pandas?
Simple, Fast, and Pandaic: ngroups Newer versions of the groupby API (pandas >= 0.23) provide this (undocumented) attribute which stores the number of groups in a GroupBy object. # setup df = pd.DataFrame({‘A’: list(‘aabbcccd’)}) dfg = df.groupby(‘A’) # call `.ngroups` on the GroupBy object dfg.ngroups # 4 Note that this is different from GroupBy.groups which … Read more
Multi Index Sorting in Pandas
When sorting by a MultiIndex you need to contain the tuple describing the column inside a list*: In [11]: df.sort_values([(‘Group1’, ‘C’)], ascending=False) Out[11]: Group1 Group2 A B C A B C 2 5 6 9 1 0 0 1 1 0 3 2 5 7 3 7 0 2 0 3 5 * so as … Read more
start index at 1 for Pandas DataFrame
Index is an object, and default index starts from 0: >>> result.index Int64Index([0, 1, 2], dtype=int64) You can shift this index by 1 with >>> result.index += 1 >>> result.index Int64Index([1, 2, 3], dtype=int64)
Good alternative to Pandas .append() method, now that it is being deprecated?
Create a list with your dictionaries, if they are needed, and then create a new dataframe with df = pd.DataFrame.from_records(your_list). List’s “append” method are very efficient and won’t be ever deprecated. Dataframes on the other hand, frequently have to be recreated and all data copied over on appends, due to their design – that is … Read more