Going Deeper›Pro· 45 min read

A First Neural Network (Keras)

Stack simple “neurons” into layers and the network learns patterns too tangled for a straight line.

What you will learn

Explain a neuron and a layer in plain words
Build and train a small neural network in Keras
Know when a neural network is worth the extra effort

From one line to a network of lines

A neural network is, at heart, many tiny logistic-regression-like units — called neurons — wired together in layers. Each neuron takes the inputs, multiplies them by weights, adds them up, and passes the result through a simple curve (an activation function) that decides how strongly it “fires”. Stack enough of these and the network can learn patterns far too tangled for a single straight line.

Picture three stages stacked left to right:

Input layer — one slot per feature (e.g. the 4 flower measurements).
Hidden layers — the neurons in the middle that do the real work of finding patterns. “Deep learning” just means more than one hidden layer.
Output layer — produces the final answer (a class or a number).

The network learns by the same engine you met earlier: it makes a prediction, measures the error with a loss function, and uses gradient descent to nudge every weight a little in the direction that lowers the error. Do that thousands of times and the weights settle on a good pattern. (The trick that spreads the error back through the layers is called backpropagation — Keras does it for you.)

Why a library does the heavy lifting

You will not hand-code neurons. Keras (which runs on TensorFlow) lets you describe a network in a few readable lines. Think of it as scikit-learn for neural networks. Install it with pip install tensorflow.

A worked example: classify iris flowers

We will build a small network for the 3-species iris problem. Read the comments — each line is one decision about the network’s shape.

A two-layer neural network classifying iris flowers in Keras

import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X, y = load_iris(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.3, random_state=0)

# neural nets train best on scaled inputs
scaler = StandardScaler().fit(Xtr)
Xtr, Xte = scaler.transform(Xtr), scaler.transform(Xte)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation='relu', input_shape=(4,)),  # hidden layer
    tf.keras.layers.Dense(3, activation='softmax'),                  # 3 species out
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(Xtr, ytr, epochs=50, verbose=0)     # train for 50 passes
loss, acc = model.evaluate(Xte, yte, verbose=0)
print('Test accuracy:', round(acc, 3))

Note: Output: Test accuracy: 0.978 The network learned to tell the three species apart with about 97.8% accuracy — on par with the forest from earlier, on this easy dataset. Each Dense line is one layer; relu and softmax are activation functions; epochs=50 means it looked at the training data 50 times, improving a little each pass.

The new words, in plain English

Term	Plain meaning
Dense layer	A layer where every neuron connects to every input from the layer before
Activation (relu/softmax)	The curve each neuron uses; `relu` in hidden layers, `softmax` to output class probabilities
Epoch	One full pass over the training data; more epochs = more learning (up to a point)
Optimizer (adam)	The smart version of gradient descent that updates the weights
Loss	The error the network is trying to shrink

When is a neural network worth it?

Neural networks are powerful but they are not the right default. For ordinary spreadsheet-style (tabular) data, a random forest or gradient boosting usually matches or beats a neural net with far less fuss. Neural networks pull ahead on big, complex, unstructured data — images, audio, language — where their layers can build up patterns no tree can.

Watch out: Neural networks are hungry: they need lots of data, careful scaling, and more compute, and they overfit small datasets easily. On a few hundred rows, prefer a simpler model — you will get a better, more explainable result faster.

Tip: This is just the doorway. The full world of deep learning — convolutional networks for images, recurrent/transformer networks for text — builds on exactly this idea of stacked layers trained by gradient descent. Our AI track explores where it leads.

Q. For a small, ordinary table of numeric features, which is usually the wiser first choice?

Answer: Neural networks shine on large, complex, unstructured data (images, text, audio). For small or medium tabular datasets, tree ensembles like random forest or gradient boosting are usually a better, easier first choice.

✍️ Practice

Change the hidden Dense layer from 16 neurons to 8, retrain, and compare the test accuracy.
Increase epochs to 100 and note whether the test accuracy improves or plateaus.

🏠 Homework

In 4–5 sentences, explain a neuron, a layer and an epoch in your own words, and name one type of data where a neural network would clearly beat a decision tree.