Wide to long data transform in pandas

You can use melt for reshaping, then split column variable and drop and sort_values. I think you can cast column year to int by astype and last change order of columns by subset:

df1 = (pd.melt(df, id_vars=['county','area'], value_name="pop"))
df1[['tmp','year']] = df1.variable.str.split('_', expand=True)
df1 = df1.drop(['variable', 'tmp'],axis=1).sort_values(['county','year'])
df1['year'] = df1.year.astype(int)
df1 = df1[['county','year','pop','area']]
print (df1)
   county  year   pop  area
0    1001  2006  1037   275
3    1001  2007  1052   275
6    1001  2008  1102   275
1    1003  2006  2399   394
4    1003  2007  2424   394
7    1003  2008  2438   394
2    1005  2006  1638   312
5    1005  2007  1647   312
8    1005  2008  1660   312

print (df1.dtypes)
county    int64
year      int32
pop       int64
area      int64
dtype: object

Another solution with set_index, stack and reset_index:

df2 = df.set_index(['county','area']).stack().reset_index(name="pop")
df2[['tmp','year']] = df2.level_2.str.split('_', expand=True)
df2 = df2.drop(['level_2', 'tmp'],axis=1)
df2['year'] = df2.year.astype(int)
df2 = df2[['county','year','pop','area']]

print (df2)
   county  year   pop  area
0    1001  2006  1037   275
1    1001  2007  1052   275
2    1001  2008  1102   275
3    1003  2006  2399   394
4    1003  2007  2424   394
5    1003  2008  2438   394
6    1005  2006  1638   312
7    1005  2007  1647   312
8    1005  2008  1660   312

print (df2.dtypes)
county    int64
year      int32
pop       int64
area      int64
dtype: object

Leave a Comment