Neural Networks: Back to Basics, Part I

4 min readAug 9, 2023

Build your fundamentals first

Introduction

Diving into the basics, I realized a need for more beginner-friendly resources. This guide is a response to that need.

Kickoff with Basics

Imagine you’re assisting someone in determining if a $500,000 tag for a 2500 sq ft apartment (about 230 meters) is fair.

Without comparisons, it’s challenging. So, after some research, you gather data from recent apartment sales:

A logical initial approach? Find the price per sq ft. This equates to $200 per sq ft.

Congratulations! You’ve just constructed your first, albeit basic, neural network. Not quite AI chatbot level, but it’s the fundamental block:

This simplistic diagram represents how the network structures its prediction. The calculation commences from the left input node. The input value transitions rightward, multiplies with the weight, and the result emerges as our output.

For a 2,500 sq ft apartment, the multiplication with $200 gives $500,000. At this tier, the prediction is mere multiplication. Before this, determining the weight for multiplication was essential. This weight determination is what we term as the “training” phase. So, “training” a neural network essentially means determining the weights to predict.

In essence, it’s a prediction model. Technically, since the output can range continuously, this model is a “regression model”.

To visualize this (let’s simplify price units from $1 to $1000, altering our weight to 0.2 instead of 200):

Enhancement and Precision

Is a mere average of data points the best we can do? Let’s refine. For enhancement, we need a clear definition of “better”. Evaluating our model against our data points, we get:

In the diagram, yellow represents errors. We aim to minimize this.

Here, we note the actual price, predicted price, and their difference. Averaging these differences gives us a measure of the model’s error. Negative values, like the -78, pose challenges. Squaring the error eliminates this negativity.Thus, our refinement goal is minimizing this error. This “Mean Square Error” becomes our loss function.

Experimenting with various weights, we realize a simple weight variation won’t suffice. Introducing a bias, however, can improve the model. With the bias in place:

With one input and one output (and no hidden layers), it appears as:

Here, W (weight) and b (bias) are determined during training. X is our input (square footage), and Y is our predicted price.

The prediction formula now evolves to:

Interactive Training Session

Why not have a go at training this basic neural network? Your objective: minimize the loss function by adjusting weight and bias. Can you achieve an error below 2,000?

Here is a proposed solution, done manually in the cmd:

Automating the Process

Kudos on your manual neural network training! Next, let’s exploreautomation. Consider this autopilot functionality:

These buttons apply “Gradient Descent”, optimizing weight and bias to reduce the loss function. The new graphs help monitor error rates. The essence of training is error reduction.

Gradient Descent’s direction is informed by calculus. By understanding our loss function and current weight and bias, the function’s derivatives guide the adjustments.

For a deeper dive into gradient descent, consider Coursera’s Machine Learning course’s initial lectures.

Adding Complexity

Is apartment size the sole price determinant? Obviously not. Let’s incorporate another factor: number of bedrooms.

The updated neural network:

Two weights (for each input) and one bias form our new model. The prediction formula evolves to:

Y=(w1×x1)+(w2×x2)+bY=(w1×x1)+(w2×x2)+b

Figuring out w1 and w2 is intricate. Gradient descent is, once again, our ally.

Feature Implementation

Having explored networks with one or two features, it’s evident how to scale. As features increase, weight optimization becomes complex. Feature selection is pivotal and is an art in itself. For a feature selection example, refer to “A Journey Through Titanic” by Omar EL Gabry, which tackles Kaggle’s Titanic challenge.

Categorization

Taking our example further, imagine a list of apartments labeled based on size and number of bedrooms:

The objective is predicting apartment desirability. Neural networks thus far have been regression-based, providing continuous values. However, often, they’re employed for classification, providing discrete outputs like “Good” or “Poor”.

For instance, TensorFlow’s app, discussed previously, is a classification model. A practical adaptation involves outputting probabilities for each class, like “Good” or “Poor”. The “softmax” operation aids in this.

For an array input [3, 5] into softmax, it might yield [0.12, 0.88], suggesting an 88% probability of the “Poor” label.

Softmax outputs are positive, summing to 1, making them apt for probability. The exaggerated difference between outputs aids training.

Closing Thoughts

This guide provides foundational insights into neural networks. As AI and machine learning continue to evolve, understanding these basics empowers us to grasp more intricate concepts and applications.