PandasCore· 30 min read

The Pandas DataFrame

A DataFrame is a full table of rows and columns — the spreadsheet of Python and the heart of Pandas.

What you will learn

  • Build a DataFrame from a dictionary
  • Understand rows, columns and the index
  • Grab a single column

A whole table at last

A DataFrame is a table — rows and columns — just like a spreadsheet or a database table. It is the object you will work with 90% of the time in data science. Each column is actually a Series.

The easiest way to build one by hand is from a dictionary, where each key becomes a column name and each list becomes that column’s values.

Build a DataFrame from a dictionary of columns
import pandas as pd

df = pd.DataFrame({
    'name':  ['Asha', 'Ravi', 'Meera'],
    'age':   [25, 30, 28],
    'city':  ['Pune', 'Delhi', 'Pune']
})
print(df)

Note: Output: name age city 0 Asha 25 Pune 1 Ravi 30 Delhi 2 Meera 28 Pune Three rows, three columns. The numbers 0, 1, 2 on the left are the row index, added automatically.

The parts of a DataFrame

PartWhat it isIn our table
ColumnOne field for every rowname, age, city
RowOne recordAsha, 25, Pune
IndexThe row labels0, 1, 2
CellOne value“Pune”

Grab one column

Pick a single column by its name in square brackets. A single column comes back as a Series, so all the maths functions still work.

Select a column, then summarise it
print(df['age'])           # one column (a Series)
print('---')
print('average age:', df['age'].mean())

Note: Output: 0 25 1 30 2 28 Name: age, dtype: int64 --- average age: 27.666666666666668 The age column came back as a Series, and .mean() told us the average age is about 27.7.

Tip: Two square brackets give you back a smaller table instead of a single column: df[['name', 'city']] returns a DataFrame with just those two columns. One bracket = a Series; two brackets = a DataFrame.

Q. How would you select just the age column from a DataFrame called df?

Answer: df['age'] returns that single column as a Series. (df.age also works, but bracket notation is safest, especially for names with spaces.)

✍️ Practice

  1. Build a DataFrame of four products with columns name, price and in_stock.
  2. Select the price column and print its average and its maximum.

🏠 Homework

  1. Create a DataFrame of five movies with columns for title, year and rating. Print just the rating column and its average.
Want to learn this with a mentor?

CodingClave runs guided, project-based training (28-day, 45-day & 6-month batches).

Explore Training →