The Pandas DataFrame
A DataFrame is a full table of rows and columns — the spreadsheet of Python and the heart of Pandas.
What you will learn
- Build a DataFrame from a dictionary
- Understand rows, columns and the index
- Grab a single column
A whole table at last
A DataFrame is a table — rows and columns — just like a spreadsheet or a database table. It is the object you will work with 90% of the time in data science. Each column is actually a Series.
The easiest way to build one by hand is from a dictionary, where each key becomes a column name and each list becomes that column’s values.
import pandas as pd
df = pd.DataFrame({
'name': ['Asha', 'Ravi', 'Meera'],
'age': [25, 30, 28],
'city': ['Pune', 'Delhi', 'Pune']
})
print(df)Note: Output: name age city 0 Asha 25 Pune 1 Ravi 30 Delhi 2 Meera 28 Pune Three rows, three columns. The numbers 0, 1, 2 on the left are the row index, added automatically.
The parts of a DataFrame
| Part | What it is | In our table |
|---|---|---|
| Column | One field for every row | name, age, city |
| Row | One record | Asha, 25, Pune |
| Index | The row labels | 0, 1, 2 |
| Cell | One value | “Pune” |
Grab one column
Pick a single column by its name in square brackets. A single column comes back as a Series, so all the maths functions still work.
print(df['age']) # one column (a Series)
print('---')
print('average age:', df['age'].mean())Note: Output:
0 25
1 30
2 28
Name: age, dtype: int64
---
average age: 27.666666666666668
The age column came back as a Series, and .mean() told us the average age is about 27.7.
Tip: Two square brackets give you back a smaller table instead of a single column: df[['name', 'city']] returns a DataFrame with just those two columns. One bracket = a Series; two brackets = a DataFrame.
Q. How would you select just the age column from a DataFrame called df?
✍️ Practice
- Build a DataFrame of four products with columns
name,priceandin_stock. - Select the
pricecolumn and print its average and its maximum.
🏠 Homework
- Create a DataFrame of five movies with columns for title, year and rating. Print just the rating column and its average.