FoundationsCore· 25 min read

What is Data Science?

Data science is turning raw, messy data into clear answers people can act on — and it follows a repeatable workflow.

What you will learn

  • Define data science in plain words
  • Walk through the data-science workflow
  • Name the core Python tools you will learn

What a data scientist actually does

A data scientist takes raw, messy data and turns it into an answer someone can act on. The boss does not want a spreadsheet — they want to know “which product should we stock more of?” or “why did sales drop in March?”. Your job is to find that answer in the data.

It is detective work. The data is the pile of clues. You clean it up, look for patterns, draw a chart that makes the pattern obvious, and write down what you found.

The data-science workflow

Almost every project follows the same five steps, in order. You will repeat this loop again and again.

  1. Ask a clear question (“Do bigger houses really sell for more?”).
  2. Get the data (a CSV file, a database, a website).
  3. Clean it — fix missing values, wrong types and duplicates.
  4. Explore & visualise — summarise the numbers and draw charts.
  5. Conclude — say what you found, in plain language.
StepWhat you doThe tool you will use
Get dataRead a file into a tablePandas
CleanFix missing values & typesPandas
CalculateFast maths on numbersNumPy
VisualiseDraw chartsMatplotlib / Seaborn
Predict (later)Train a simple modelscikit-learn

A tiny taste

Here is the whole workflow in spirit — load a little data, ask one question, and answer it.

The data-science loop in miniature: load, ask, answer
import pandas as pd

# Tiny dataset: city and average house price (in thousands)
data = {'city': ['Pune', 'Pune', 'Delhi', 'Delhi'],
        'price': [60, 80, 120, 140]}
df = pd.DataFrame(data)

# Question: what is the average price per city?
print(df.groupby('city')['price'].mean())

Note: Output: city Delhi 130.0 Pune 70.0 Name: price, dtype: float64 We asked a question and the data answered: Delhi homes average 130k, Pune 70k. You will learn every piece of this code in the coming lessons.

Tip: You do not need to be a maths genius. Most data-science work is cleaning and looking at data. The Python libraries do the hard maths for you.

Watch out: Real data is messy — typos, blanks, wrong formats. Beginners expect tidy spreadsheets; pros expect a mess and budget most of their time for cleaning it.

Q. Which of these is the best description of a data scientist’s goal?

Answer: Data science is about extracting useful, actionable insight from data — not just storing or formatting it.

✍️ Practice

  1. Write down the five workflow steps from memory, in order.
  2. Pick a question you are curious about (e.g. “Do longer YouTube videos get more views?”) and note what data you would need to answer it.

🏠 Homework

  1. Find one real dataset online (try data.gov.in or Kaggle) and write the one question you would try to answer with it.
Want to learn this with a mentor?

CodingClave runs guided, project-based training (28-day, 45-day & 6-month batches).

Explore Training →