Splitting multiple columns into rows in pandas dataframe

You can first split columns, create Series by stack and remove whitespaces by strip: s1 = df.value.str.split(‘,’, expand=True).stack().str.strip().reset_index(level=1, drop=True) s2 = df.date.str.split(‘,’, expand=True).stack().str.strip().reset_index(level=1, drop=True) Then concat both Series to df1: df1 = pd.concat([s1,s2], axis=1, keys=[‘value’,’date’]) Remove old columns value and date and join: print (df.drop([‘value’,’date’], axis=1).join(df1).reset_index(drop=True)) ticker account value date 0 aa assets 100 20121231 … Read more

Pandas: convert date in month to the 1st day of next month

You can use pd.offsets.MonthBegin() In [261]: d = pd.to_datetime([‘2011-09-30’, ‘2012-02-28’]) In [262]: d Out[262]: DatetimeIndex([‘2011-09-30’, ‘2012-02-28’], dtype=”datetime64[ns]”, freq=None) In [263]: d + pd.offsets.MonthBegin(1) Out[263]: DatetimeIndex([‘2011-10-01’, ‘2012-03-01′], dtype=”datetime64[ns]”, freq=None) You’ll find a lot of examples in the official Pandas docs

Why does it take ages to install Pandas on Alpine Linux

Debian based images use only python pip to install packages with .whl format: Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB) Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB) WHL format was developed as a quicker and more reliable method of installing Python software than re-building from source code every time. WHL files only have to be moved to the correct location on the target … Read more

Standard implementation of vectorize_sequences

Solution with MultiLabelBinarizer Assuming sequences is an array of integers with maximum possible value upto dimension-1, we can use MultiLabelBinarizer from sklearn.preprocessing to replicate the behaviour of the function vectorize_sequences from sklearn.preprocessing import MultiLabelBinarizer mlb = MultiLabelBinarizer(classes=range(dimension)) mlb.fit_transform(sequences) Solution with Numpy broadcasting Assuming sequences is an array of integers with maximum possible value upto dimension-1 … Read more

pandas dataframe group and sort by weekday

You can use ordered catagorical first: cats = [ ‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’, ‘Saturday’, ‘Sunday’] df[‘Day of Week’] = df[‘Day of Week’].astype(‘category’, categories=cats, ordered=True) In pandas 0.21.0+ use: from pandas.api.types import CategoricalDtype cat_type = CategoricalDtype(categories=cats, ordered=True) df[‘Day of Week’] = df[‘Day of Week’].astype(cat_type) Or reindex: df_weekday = df.groupby([‘Day of Week’]).sum().reindex(cats)