A First Neural Network (Keras)
Stack simple “neurons” into layers and the network learns patterns too tangled for a straight line.
What you will learn
- Explain a neuron and a layer in plain words
- Build and train a small neural network in Keras
- Know when a neural network is worth the extra effort
From one line to a network of lines
A neural network is, at heart, many tiny logistic-regression-like units — called neurons — wired together in layers. Each neuron takes the inputs, multiplies them by weights, adds them up, and passes the result through a simple curve (an activation function) that decides how strongly it “fires”. Stack enough of these and the network can learn patterns far too tangled for a single straight line.
Picture three stages stacked left to right:
- Input layer — one slot per feature (e.g. the 4 flower measurements).
- Hidden layers — the neurons in the middle that do the real work of finding patterns. “Deep learning” just means more than one hidden layer.
- Output layer — produces the final answer (a class or a number).
The network learns by the same engine you met earlier: it makes a prediction, measures the error with a loss function, and uses gradient descent to nudge every weight a little in the direction that lowers the error. Do that thousands of times and the weights settle on a good pattern. (The trick that spreads the error back through the layers is called backpropagation — Keras does it for you.)
Why a library does the heavy lifting
You will not hand-code neurons. Keras (which runs on TensorFlow) lets you describe a network in a few readable lines. Think of it as scikit-learn for neural networks. Install it with pip install tensorflow.
A worked example: classify iris flowers
We will build a small network for the 3-species iris problem. Read the comments — each line is one decision about the network’s shape.
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X, y = load_iris(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.3, random_state=0)
# neural nets train best on scaled inputs
scaler = StandardScaler().fit(Xtr)
Xtr, Xte = scaler.transform(Xtr), scaler.transform(Xte)
model = tf.keras.Sequential([
tf.keras.layers.Dense(16, activation='relu', input_shape=(4,)), # hidden layer
tf.keras.layers.Dense(3, activation='softmax'), # 3 species out
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(Xtr, ytr, epochs=50, verbose=0) # train for 50 passes
loss, acc = model.evaluate(Xte, yte, verbose=0)
print('Test accuracy:', round(acc, 3))Note: Output:
Test accuracy: 0.978
The network learned to tell the three species apart with about 97.8% accuracy — on par with the forest from earlier, on this easy dataset. Each Dense line is one layer; relu and softmax are activation functions; epochs=50 means it looked at the training data 50 times, improving a little each pass.
The new words, in plain English
| Term | Plain meaning |
|---|---|
| Dense layer | A layer where every neuron connects to every input from the layer before |
| Activation (relu/softmax) | The curve each neuron uses; relu in hidden layers, softmax to output class probabilities |
| Epoch | One full pass over the training data; more epochs = more learning (up to a point) |
| Optimizer (adam) | The smart version of gradient descent that updates the weights |
| Loss | The error the network is trying to shrink |
When is a neural network worth it?
Neural networks are powerful but they are not the right default. For ordinary spreadsheet-style (tabular) data, a random forest or gradient boosting usually matches or beats a neural net with far less fuss. Neural networks pull ahead on big, complex, unstructured data — images, audio, language — where their layers can build up patterns no tree can.
Watch out: Neural networks are hungry: they need lots of data, careful scaling, and more compute, and they overfit small datasets easily. On a few hundred rows, prefer a simpler model — you will get a better, more explainable result faster.
Tip: This is just the doorway. The full world of deep learning — convolutional networks for images, recurrent/transformer networks for text — builds on exactly this idea of stacked layers trained by gradient descent. Our AI track explores where it leads.
Q. For a small, ordinary table of numeric features, which is usually the wiser first choice?
✍️ Practice
- Change the hidden
Denselayer from 16 neurons to 8, retrain, and compare the test accuracy. - Increase
epochsto 100 and note whether the test accuracy improves or plateaus.
🏠 Homework
- In 4–5 sentences, explain a neuron, a layer and an epoch in your own words, and name one type of data where a neural network would clearly beat a decision tree.