What is Data Science?
Data science is turning raw, messy data into clear answers people can act on — and it follows a repeatable workflow.
What you will learn
- Define data science in plain words
- Walk through the data-science workflow
- Name the core Python tools you will learn
What a data scientist actually does
A data scientist takes raw, messy data and turns it into an answer someone can act on. The boss does not want a spreadsheet — they want to know “which product should we stock more of?” or “why did sales drop in March?”. Your job is to find that answer in the data.
It is detective work. The data is the pile of clues. You clean it up, look for patterns, draw a chart that makes the pattern obvious, and write down what you found.
The data-science workflow
Almost every project follows the same five steps, in order. You will repeat this loop again and again.
- Ask a clear question (“Do bigger houses really sell for more?”).
- Get the data (a CSV file, a database, a website).
- Clean it — fix missing values, wrong types and duplicates.
- Explore & visualise — summarise the numbers and draw charts.
- Conclude — say what you found, in plain language.
| Step | What you do | The tool you will use |
|---|---|---|
| Get data | Read a file into a table | Pandas |
| Clean | Fix missing values & types | Pandas |
| Calculate | Fast maths on numbers | NumPy |
| Visualise | Draw charts | Matplotlib / Seaborn |
| Predict (later) | Train a simple model | scikit-learn |
A tiny taste
Here is the whole workflow in spirit — load a little data, ask one question, and answer it.
import pandas as pd
# Tiny dataset: city and average house price (in thousands)
data = {'city': ['Pune', 'Pune', 'Delhi', 'Delhi'],
'price': [60, 80, 120, 140]}
df = pd.DataFrame(data)
# Question: what is the average price per city?
print(df.groupby('city')['price'].mean())Note: Output: city Delhi 130.0 Pune 70.0 Name: price, dtype: float64 We asked a question and the data answered: Delhi homes average 130k, Pune 70k. You will learn every piece of this code in the coming lessons.
Tip: You do not need to be a maths genius. Most data-science work is cleaning and looking at data. The Python libraries do the hard maths for you.
Watch out: Real data is messy — typos, blanks, wrong formats. Beginners expect tidy spreadsheets; pros expect a mess and budget most of their time for cleaning it.
Q. Which of these is the best description of a data scientist’s goal?
✍️ Practice
- Write down the five workflow steps from memory, in order.
- Pick a question you are curious about (e.g. “Do longer YouTube videos get more views?”) and note what data you would need to answer it.
🏠 Homework
- Find one real dataset online (try data.gov.in or Kaggle) and write the one question you would try to answer with it.