Introduction
Whether we like it or not, in pandas we will come across to Series or DataFrame with multi-index. A multi-index often be generated from method .groupby() or .set_index(). We will tend to use reset_index() to set it back to normal Series/ DataFrame. But in some situation knowing how to deal with a multi-index will be a benifit. And the method we used in multi-index will give us a deeper understanding of DataFrame and Series.
In this article we will talk about:
1. What is a multi-index Series/ DataFrame?
2. How to select a multi-index?
3. How to concat two multi-index DataFrame?
What is a multi-index Series/ DataFrame?
Visually we will know which Series/ DataFrame is a multi-index. But also we can use .index to check if it is a multi-index. If it is, pandas will show it in the return values.
On a deeper level, a multi-index Series/ DataFrame is no more than a Series/ DataFrame, but has an added-dimention. Which makes a multi-index Series acts more like an normal DataFrame. We will talk about this again in the next section.
A multi-index can be .unstack(), and If we .unstack() a multi-index Series, we will have a normal index DataFrame.
How to select a multi-index?
1. Multi-index Series
Using .loc[], from outer level to inner level. Using .loc[:, ] to skip the outer level.
If we look carefully, this .loc[] operation is exactly the same as we are choosing a DataFrame. Frist element is rows, comma, and second element is columns. Actually it is the DataFrame we .unstack() from original multi-index Series.
2. Multi-index DataFrame
Using .loc[], from outer level to inner level. But different with Series, because it is already a DataFrame, we can not just use a comma to seperate. We will use a () to tell Python they are both for rows. Then use a comma to choose columns.
Both outer level and inner level can be a list, for our multi-selection.
To skip the outer level is a little bit tricky. We may think of:
df.loc[(:, ‘2016-10-03‘), ‘Close‘]
But actually this can not work. And the correct way is using slice(None).
How to concat two multi-index DataFrame?
If we have new columns to add, we can use pd.merge(). But we have to use arguments left_index=True and right_index=True.
Summary
Make a long story short: Using .loc operator to choose multi-index Series and DataFrame. Using pd.merge to concate two multi-index DataFrame.
A tidy data requires: each variable must has it‘s own column, each observation must has it own row, each value must has it‘s own cell. So a multi-index data is not a tidy data. We can use .reset_index() to change it into tidy data.
Dealing with a multi-index pandas Series and DataFrame
原文:https://www.cnblogs.com/drvongoosewing/p/12031235.html