I knew there had to be a quick way to repack a pandas DataFrame which repeated a series of MultiIndex
values, but I couldn’t seem to find the right words to describe what I was looking for when searching in the documentation.
The DataFrame looks something like:
>>> print(df)
ace
season year
jja 1950 1.544351
1951 -0.213187
1952 -0.849289
1953 -0.597970
1954 -0.178577
... ...
jjason 2016 0.629216
2017 2.175511
2018 0.503849
2019 0.522489
2020 1.354488
[213 rows x 1 columns]
Here there are three seasons of ace
, each from 1950-2020.
The method I was looking for is unstack
. Observe:
>>> df.unstack(level=0)
ace
season jja son jjason
year
1950 1.544351 1.755846 1.946374
1951 -0.213187 0.034605 -0.058150
1952 -0.849289 -0.380924 -0.625975
1953 -0.597970 0.130868 -0.137412
1954 -0.178577 0.066437 -0.020245
... ... ... ...
2016 0.085523 0.783252 0.629216
2017 0.021958 2.850757 2.175511
2018 -0.628395 0.990398 0.503849
2019 -0.414401 0.903389 0.522489
2020 0.535681 1.502838 1.354488
[71 rows x 3 columns]
Almost there, but I’d like to combine the nested column headers to a more specific name. df.columns.to_flat_index
does this the way I want.
>>> df.columns = df.columns.to_flat_index().str.join('_')
ace_jja ace_son ace_jjason
year
1950 1.544351 1.755846 1.946374
1951 -0.213187 0.034605 -0.058150
1952 -0.849289 -0.380924 -0.625975
1953 -0.597970 0.130868 -0.137412
1954 -0.178577 0.066437 -0.020245
... ... ... ...
2016 0.085523 0.783252 0.629216
2017 0.021958 2.850757 2.175511
2018 -0.628395 0.990398 0.503849
2019 -0.414401 0.903389 0.522489
2020 0.535681 1.502838 1.354488
[71 rows x 3 columns]