Going DeeperPro· 45 min read

A First Neural Network (Keras)

Stack simple “neurons” into layers and the network learns patterns too tangled for a straight line.

What you will learn

  • Explain a neuron and a layer in plain words
  • Build and train a small neural network in Keras
  • Know when a neural network is worth the extra effort

From one line to a network of lines

A neural network is, at heart, many tiny logistic-regression-like units — called neurons — wired together in layers. Each neuron takes the inputs, multiplies them by weights, adds them up, and passes the result through a simple curve (an activation function) that decides how strongly it “fires”. Stack enough of these and the network can learn patterns far too tangled for a single straight line.

Picture three stages stacked left to right:

  • Input layer — one slot per feature (e.g. the 4 flower measurements).
  • Hidden layers — the neurons in the middle that do the real work of finding patterns. “Deep learning” just means more than one hidden layer.
  • Output layer — produces the final answer (a class or a number).

The network learns by the same engine you met earlier: it makes a prediction, measures the error with a loss function, and uses gradient descent to nudge every weight a little in the direction that lowers the error. Do that thousands of times and the weights settle on a good pattern. (The trick that spreads the error back through the layers is called backpropagation — Keras does it for you.)

Why a library does the heavy lifting

You will not hand-code neurons. Keras (which runs on TensorFlow) lets you describe a network in a few readable lines. Think of it as scikit-learn for neural networks. Install it with pip install tensorflow.

A worked example: classify iris flowers

We will build a small network for the 3-species iris problem. Read the comments — each line is one decision about the network’s shape.

A two-layer neural network classifying iris flowers in Keras
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X, y = load_iris(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.3, random_state=0)

# neural nets train best on scaled inputs
scaler = StandardScaler().fit(Xtr)
Xtr, Xte = scaler.transform(Xtr), scaler.transform(Xte)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation='relu', input_shape=(4,)),  # hidden layer
    tf.keras.layers.Dense(3, activation='softmax'),                  # 3 species out
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(Xtr, ytr, epochs=50, verbose=0)     # train for 50 passes
loss, acc = model.evaluate(Xte, yte, verbose=0)
print('Test accuracy:', round(acc, 3))

Note: Output: Test accuracy: 0.978 The network learned to tell the three species apart with about 97.8% accuracy — on par with the forest from earlier, on this easy dataset. Each Dense line is one layer; relu and softmax are activation functions; epochs=50 means it looked at the training data 50 times, improving a little each pass.

The new words, in plain English

TermPlain meaning
Dense layerA layer where every neuron connects to every input from the layer before
Activation (relu/softmax)The curve each neuron uses; relu in hidden layers, softmax to output class probabilities
EpochOne full pass over the training data; more epochs = more learning (up to a point)
Optimizer (adam)The smart version of gradient descent that updates the weights
LossThe error the network is trying to shrink

When is a neural network worth it?

Neural networks are powerful but they are not the right default. For ordinary spreadsheet-style (tabular) data, a random forest or gradient boosting usually matches or beats a neural net with far less fuss. Neural networks pull ahead on big, complex, unstructured data — images, audio, language — where their layers can build up patterns no tree can.

Watch out: Neural networks are hungry: they need lots of data, careful scaling, and more compute, and they overfit small datasets easily. On a few hundred rows, prefer a simpler model — you will get a better, more explainable result faster.

Tip: This is just the doorway. The full world of deep learning — convolutional networks for images, recurrent/transformer networks for text — builds on exactly this idea of stacked layers trained by gradient descent. Our AI track explores where it leads.

Q. For a small, ordinary table of numeric features, which is usually the wiser first choice?

Answer: Neural networks shine on large, complex, unstructured data (images, text, audio). For small or medium tabular datasets, tree ensembles like random forest or gradient boosting are usually a better, easier first choice.

✍️ Practice

  1. Change the hidden Dense layer from 16 neurons to 8, retrain, and compare the test accuracy.
  2. Increase epochs to 100 and note whether the test accuracy improves or plateaus.

🏠 Homework

  1. In 4–5 sentences, explain a neuron, a layer and an epoch in your own words, and name one type of data where a neural network would clearly beat a decision tree.
Want to learn this with a mentor?

CodingClave runs guided, project-based training (28-day, 45-day & 6-month batches).

Explore Training →