GAP (GradientBasedLearningForCAP)

This example demonstrates how to train a small feed-forward neural network for a binary classification task using the \texttt{GradientBasedLearningForCAP} package. We use the binary cross-entropy loss and optimise the network parameters with gradient descent.

The dataset consists of points (x_1, x_2) \in \mathbb{R}^2 labelled by a non-linear decision rule describing two regions that form \emph{class 0}:

All remaining points belong to \emph{class 1}. Hence the classification boundary is not linearly separable and requires a non-linear model. We build a neural network with three hidden layers and a sigmoid output, fit it on the provided training examples for several epochs, and then evaluate the trained model on a grid of input points to visualise the learned decision regions.

gap> Smooth := SkeletalSmoothMaps;
SkeletalSmoothMaps
gap> Lenses := CategoryOfLenses( Smooth );
CategoryOfLenses( SkeletalSmoothMaps )
gap> Para := CategoryOfParametrisedMorphisms( Smooth );
CategoryOfParametrisedMorphisms( SkeletalSmoothMaps )
gap> hidden_layers := [ 6, 6, 6 ];;
gap> f := NeuralNetworkLossMorphism( Para, 2, hidden_layers, 1, "Sigmoid" );;
gap> optimizer := Lenses.GradientDescentOptimizer( : learning_rate := 0.01 );
function( n ) ... end
gap> training_examples_path := Filename(
>   DirectoriesPackageLibrary("GradientBasedLearningForCAP", "examples")[1],
>   "NeuralNetwork_BinaryCrossEntropy/data/training_examples.txt" );;
gap> batch_size := 2;
2
gap> one_epoch_update := OneEpochUpdateLens( f, optimizer,
>                         training_examples_path, batch_size );
(ℝ^109, ℝ^109) -> (ℝ^1, ℝ^0) defined by:

Get Morphism:
------------
ℝ^109 -> ℝ^1

Put Morphism:
------------
ℝ^109 -> ℝ^109
gap> nr_weights := RankOfObject( Source( PutMorphism( one_epoch_update ) ) );
109
gap> rs := RandomSource( IsMersenneTwister, 1 );;
gap> w := List( [ 1 .. nr_weights ], i -> 0.001 * Random( rs, [ -1000 .. 1000 ] ) );;
gap> w{[ 1 .. 5 ]};
[ 0.789, -0.767, -0.613, -0.542, 0.301 ]
gap> nr_epochs := 25;
25
gap> w := Fit( one_epoch_update, nr_epochs, w : verbose := true );;
Epoch  0/25 - loss = 0.6274786697292678
Epoch  1/25 - loss = 0.50764552556010512
Epoch  2/25 - loss = 0.46701509497218296
Epoch  3/25 - loss = 0.43998434603387304
Epoch  4/25 - loss = 0.41390897205434185
Epoch  5/25 - loss = 0.38668229524419645
Epoch  6/25 - loss = 0.3615103023137366
Epoch  7/25 - loss = 0.33852687543477167
Epoch  8/25 - loss = 0.31713408584173464
Epoch  9/25 - loss = 0.29842876608165969
Epoch 10/25 - loss = 0.28310739567373933
Epoch 11/25 - loss = 0.26735508537538627
Epoch 12/25 - loss = 0.25227135017462571
Epoch 13/25 - loss = 0.23858070423434527
Epoch 14/25 - loss = 0.22557724727481232
Epoch 15/25 - loss = 0.2151923109202202
Epoch 16/25 - loss = 0.20589044111812799
Epoch 17/25 - loss = 0.19857151366814263
Epoch 18/25 - loss = 0.19229381748983518
Epoch 19/25 - loss = 0.18814544378812006
Epoch 20/25 - loss = 0.18465371077598913
Epoch 21/25 - loss = 0.18166012790192537
Epoch 22/25 - loss = 0.17685616213693178
Epoch 23/25 - loss = 0.17665872918251943
Epoch 24/25 - loss = 0.17073585936950184
Epoch 25/25 - loss = 0.16744783175344116
gap> w;
[ 1.47751, -0.285187, -1.87358, -1.87839, 0.687266,
  -0.88329, -0.607225, 0.57876, 0.084489, 1.1218,
  0.289778, -1.15844, 0.562299, -0.725222, 0.724775,
  0.643942, 0.202536, 0.131565, 0.768751, -0.345379,
  -0.147853, -1.52103, -1.26183, 1.39931, 0.00143737,
  -0.819752, -0.90015, -0.534457, 0.74204, -0.768,
  -1.85381, 0.225274, -0.384199, 1.1034, 0.82565,
  0.423966, 0.719847, 0.487972, 0.266537, -0.442324,
  0.520839, 0.306871, -0.205834, -0.314044, 0.0395323,
  -0.489954, -0.368816, 0.305383, -0.181872, 0.775344,
  -0.57507, -0.792, -0.937068, 1.39995, -0.0236236,
  0.370827, -0.778542, -0.783943, 0.034, 0.343554,
  -1.00419, 0.857391, -1.07632, -0.677147, 0.839605,
  0.719, 1.40418, -0.221851, 1.29824, 0.510027,
  0.217811, 0.344086, 0.579, 0.576412, 0.070248,
  -0.145523, 0.468713, 0.680618, 0.199966, -0.497,
  -0.408801, 0.0519444, -0.597412, 0.137205, 1.25696,
  -0.0884903, -0.252, -0.721624, -1.25962, 0.894349,
  0.447327, -1.00492, -1.54383, 0.464574, -0.723211,
  -0.108064, -0.486439, -0.385, -0.484, -0.862,
  -0.121845, 1.0856, 1.09068, 1.69466, 0.938733,
  0.529301, -0.465345, 1.23872, 1.07609 ]
gap> predict := NeuralNetworkPredictionMorphism( Para, 2, hidden_layers, 1, "Sigmoid" );
ℝ^2 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^109

Underlying Morphism:
-------------------
ℝ^111 -> ℝ^1
gap> predict_given_w := ReparametriseMorphism( predict, Smooth.Constant( w ) );
ℝ^2 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^0

Underlying Morphism:
-------------------
ℝ^2 -> ℝ^1
gap> predict_using_w := UnderlyingMorphism( predict_given_w );
ℝ^2 -> ℝ^1
gap> inputs := Cartesian( 0.1 * [ -10 .. 10 ], 0.1 * [ -10 .. 10 ] );;
gap> predictions := List( inputs, x -> 
>           SelectBasedOnCondition( predict_using_w( x )[1] > 0.5, 1, 0 ) );;
gap> # ScatterPlotUsingPython( inputs, predictions );

Executing the command \texttt{ScatterPlotUsingPython( inputs, predictions );} produces the following plot:

2.2 Multi-Class Neural Network with Cross-Entropy Loss Function

This example demonstrates how to train a small feed-forward neural network for a multi-class classification task using the \texttt{GradientBasedLearningForCAP} package. We employ the cross-entropy loss function and optimise the network parameters with gradient descent.

The dataset consists of points (x_1, x_2) \in \mathbb{R}^2 labelled by a non-linear decision rule describing three regions that form We build a neural network with three hidden layers and a Softmax output, fit it on the provided training examples for several epochs, and then evaluate the trained model on a grid of input points to visualise the learned decision regions.

gap> Smooth := SkeletalSmoothMaps;
SkeletalSmoothMaps
gap> Lenses := CategoryOfLenses( Smooth );
CategoryOfLenses( SkeletalSmoothMaps )
gap> Para := CategoryOfParametrisedMorphisms( Smooth );
CategoryOfParametrisedMorphisms( SkeletalSmoothMaps )
gap> hidden_layers := [ 6, 6, 6 ];;
gap> f := NeuralNetworkLossMorphism( Para, 2, hidden_layers, 3, "Softmax" );;
gap> optimizer := Lenses.GradientDescentOptimizer( : learning_rate := 0.1 );
function( n ) ... end
gap> training_examples_path := Filename(
>   DirectoriesPackageLibrary("GradientBasedLearningForCAP", "examples")[1],
>   "NeuralNetwork_CrossEntropy/data/training_examples.txt" );;
gap> batch_size := 4;
4
gap> one_epoch_update := OneEpochUpdateLens( f, optimizer,
>                         training_examples_path, batch_size );
(ℝ^123, ℝ^123) -> (ℝ^1, ℝ^0) defined by:

Get Morphism:
------------
ℝ^123 -> ℝ^1

Put Morphism:
------------
ℝ^123 -> ℝ^123
gap> nr_weights := RankOfObject( Source( PutMorphism( one_epoch_update ) ) );
123
gap> rs := RandomSource( IsMersenneTwister, 1 );;
gap> w := List( [ 1 .. nr_weights ], i -> 0.001 * Random( rs, [ -1000 .. 1000 ] ) );;
gap> Display( w{[ 1 .. 5 ]} );
[ 0.789, -0.767, -0.613, -0.542, 0.301 ]
gap> nr_epochs := 16;
16
gap> w := Fit( one_epoch_update, nr_epochs, w : verbose := true );;
Epoch  0/16 - loss = 0.80405334335407785
Epoch  1/16 - loss = 0.18338542093217905
Epoch  2/16 - loss = 0.1491650040794873
Epoch  3/16 - loss = 0.13186409729963983
Epoch  4/16 - loss = 0.12293129048146505
Epoch  5/16 - loss = 0.11742704538825839
Epoch  6/16 - loss = 0.11191588532335346
Epoch  7/16 - loss = 0.10441947487056685
Epoch  8/16 - loss = 0.095102838431592687
Epoch  9/16 - loss = 0.092441708967385072
Epoch 10/16 - loss = 0.097057579505470393
Epoch 11/16 - loss = 0.093295953606638768
Epoch 12/16 - loss = 0.082114375099200984
Epoch 13/16 - loss = 0.082910416530212819
Epoch 14/16 - loss = 0.082815082271383303
Epoch 15/16 - loss = 0.085405485529683856
Epoch 16/16 - loss = 0.087825108242740729
gap> w;
[ 0.789, -1.09294, -1.43008, -0.66714, 1.27126, -1.12774, -0.240397, 0.213, 
  -0.382376, 1.42204, 0.300837, -1.79451, 0.392967, -0.868913, 0.858, 
  1.16231, 0.769031, 0.309303, 0.555253, -0.142223, 0.0703106, -0.997, 
  -0.746, 0.9, -0.248, -0.801, -0.317, -0.826, 0.0491083, -1.51073, -1.01246, 
  0.371752, -0.852, 0.342548, 1.01666, 1.39005, 0.958034, 0.357176, 0.3225, 
  -0.29, -1.0095, 0.154876, -0.460859, -0.582425, 0.223943, -0.402, -0.368, 
  0.275911, -0.0791975, 0.0986371, -0.487903, -0.699542, -0.553485, 0.766, 
  1.88163, 0.903741, -0.895688, -0.949546, 0.034, 0.13, -0.91, 0.67043, 
  -0.784672, -0.195688, 1.49813, 0.881451, 0.679593, -0.380004, 0.743062, 
  0.529804, 0.221497, 0.487694, 1.12092, 1.38134, -0.313891, 0.780071, 
  0.00526383, 0.422997, 0.287254, -0.42555, -0.0525988, -0.159442, -0.256285, 
  -0.296361, 0.822117, -0.23663, -0.252, -0.986452, -0.955211, 0.52727, 
  0.261295, -0.867, -0.787, -0.395, -0.871, -0.205, -0.315, -0.385, 
  -0.292919, -1.46115, -0.634953, 0.818446, 0.903525, 0.833456, 1.59504, 
  -0.500531, -0.191608, 0.390861, 0.808496, -1.94883, 0.445591, -1.62511, 
  -0.601054, -0.154008, -1.20266, -0.255521, 0.989522, 0.29963, 0.372084, 
  1.07529, -0.909025, 0.454265, 0.539106 ]
gap> predict := NeuralNetworkPredictionMorphism( Para, 2, hidden_layers, 3, "Softmax" );
ℝ^2 -> ℝ^3 defined by:

Underlying Object:
-----------------
ℝ^123

Underlying Morphism:
-------------------
ℝ^125 -> ℝ^3
gap> predict_given_w := ReparametriseMorphism( predict, Smooth.Constant( w ) );
ℝ^2 -> ℝ^3 defined by:

Underlying Object:
-----------------
ℝ^0

Underlying Morphism:
-------------------
ℝ^2 -> ℝ^3
gap> predict_using_w := UnderlyingMorphism( predict_given_w );
ℝ^2 -> ℝ^3
gap> inputs := Cartesian( 0.1 * [ -10 .. 10 ], 0.1 * [ -10 .. 10 ] );;
gap> predictions := List( inputs, x ->
>     -1 + Position( predict_using_w( x ), Maximum( predict_using_w( x ) ) ) );;
gap> # ScatterPlotUsingPython( inputs, predictions );

Executing the command \texttt{ScatterPlotUsingPython( inputs, predictions );} produces the following plot:

2.3 Neural Network with Quadratic Loss Function

This example demonstrates how to train a small feed-forward neural network for a regression task using the \texttt{GradientBasedLearningForCAP} package. We employ the quadratic loss function and optimise the network parameters with gradient descent. The dataset consists of points (x_1, x_2) \in \mathbb{R}^2 with corresponding outputs y \in \mathbb{R} generated by a linear function with some added noise. Concretely, the outputs are generated according to the formula We build a neural network with input dimension 2, no hidden layers, and output dimension 1. Hence, the affine map between input and output layer has the following matrix dimensions (together with bias vector): Where W_1 \in \mathbb{R}^{2 \times 1} and b_1 \in \mathbb{R}^1 are the weights and bias to be learned. Equivalently, the network computes for an input a_0 \in \mathbb{R}^2 the output Hence, the number of parameters to learn is 3 (two weights and one bias). We fit the neural network on the provided training examples for 30 epochs, and then compare the learned parameters to the perfect weights used to generate the dataset. We use the Adam optimiser for gradient descent. Hence, the initiat weights vector (t, m_1, m_2, m_3, v_1, v_2, v_3, w_1, w_2, b_1) \in \mathbb{R}^{1+3+3+3} contains additional parameters for the optimiser (the m's and v's). We initialise t to 1 and m's and v's to 0.

gap> Smooth := SkeletalSmoothMaps;
SkeletalSmoothMaps
gap> Lenses := CategoryOfLenses( Smooth );
CategoryOfLenses( SkeletalSmoothMaps )
gap> Para := CategoryOfParametrisedMorphisms( Smooth );
CategoryOfParametrisedMorphisms( SkeletalSmoothMaps )
gap> f := NeuralNetworkLossMorphism( Para, 2, [ ], 1, "IdFunc" );
ℝ^3 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^3

Underlying Morphism:
-------------------
ℝ^6 -> ℝ^1
gap> optimizer := Lenses.AdamOptimizer();
function( n ) ... end
gap> training_examples_path := Filename(
>     DirectoriesPackageLibrary("GradientBasedLearningForCAP", "examples")[1],
>     "NeuralNetwork_QuadraticLoss/data/training_examples.txt" );;
gap> batch_size := 5;
5
gap> one_epoch_update := OneEpochUpdateLens( f, optimizer, 
>                         training_examples_path, batch_size );
(ℝ^10, ℝ^10) -> (ℝ^1, ℝ^0) defined by:

Get Morphism:
------------
ℝ^10 -> ℝ^1

Put Morphism:
------------
ℝ^10 -> ℝ^10
gap> w := [ 1, 0, 0, 0, 0, 0, 0, 0.21, -0.31, 0.7 ];
[ 1, 0, 0, 0, 0, 0, 0, 0.21, -0.31, 0.7 ]
gap> nr_epochs := 30;
30
gap> w := Fit( one_epoch_update, nr_epochs, w );;
Epoch  0/30 - loss = 4.4574869198
Epoch  1/30 - loss = 1.0904439656285798
Epoch  2/30 - loss = 0.44893422753741707
Epoch  3/30 - loss = 0.24718222552679428
Epoch  4/30 - loss = 0.15816538314892969
Epoch  5/30 - loss = 0.11009214898573197
Epoch  6/30 - loss = 0.080765189573546586
Epoch  7/30 - loss = 0.061445427900729599
Epoch  8/30 - loss = 0.04803609207319106
Epoch  9/30 - loss = 0.038370239087861441
Epoch 10/30 - loss = 0.031199992288917108
Epoch 11/30 - loss = 0.025760084031019172
Epoch 12/30 - loss = 0.021557800050973547
Epoch 13/30 - loss = 0.018263315597330656
Epoch 14/30 - loss = 0.01564869258749324
Epoch 15/30 - loss = 0.013552162640841157
Epoch 16/30 - loss = 0.011856309185255345
Epoch 17/30 - loss = 0.010474254262187581
Epoch 18/30 - loss = 0.0093406409193010267
Epoch 19/30 - loss = 0.008405587711401704
Epoch 20/30 - loss = 0.0076305403249797375
Epoch 21/30 - loss = 0.0069853659369945552
Epoch 22/30 - loss = 0.0064462805409909937
Epoch 23/30 - loss = 0.0059943461353685126
Epoch 24/30 - loss = 0.0056143650058947617
Epoch 25/30 - loss = 0.0052940553411779294
Epoch 26/30 - loss = 0.0050234291867088457
Epoch 27/30 - loss = 0.0047943179297568897
Epoch 28/30 - loss = 0.0046000067074985669
Epoch 29/30 - loss = 0.004434950161766555
Epoch 30/30 - loss = 0.0042945495896027528
gap> w;
[ 601, -0.00814765, -0.0328203, 0.00154532, 0.0208156, 0.0756998,
0.047054, 2.01399, -2.9546, 0.989903 ]

We notice that the learned weights w_1 \approx 2.01399, w_2 \approx -2.9546, and b_1 \approx 0.989903 are close to the perfect weights 2, -3, and 1 used to generate the dataset.

2.4 Next Local Minima

In this example we demonstrate how to use the fitting machinery of \texttt{GradientBasedLearningForCAP} to find a nearby local minimum of a smooth function by gradient-based optimisation.

We consider the function which has local minima at the points (\pi k, 1) for k \in \mathbb{Z}. We use the Adam optimiser to find a local minimum starting from an initial point. Hence, the parameter vector is of the form where t is the time step, m_1 and m_2 are the first moment estimates for \theta_1 and \theta_2 respectively, and v_1 and v_2 are the second moment estimates for \theta_1 and \theta_2 respectively. We start from the initial point which is close to the local minimum at (\pi, 1). After running the optimisation for 500 epochs, we reach the point where the last two components correspond to the parameters \theta_1 and \theta_2. Evaluating the function f at this point gives us the value which is very close to 0, the value of the function at the local minima. Thus, we have successfully found a local minimum using gradient-based optimisation. Note that during the optimisation process, the \theta_1 parameter moved from approximately 1.58 to approximately \pi, while the \theta_2 parameter moved from 0.1 to approximately 1.

gap> Smooth := SkeletalCategoryOfSmoothMaps( );
SkeletalSmoothMaps
gap> Lenses := CategoryOfLenses( Smooth );
CategoryOfLenses( SkeletalSmoothMaps )
gap> Para := CategoryOfParametrisedMorphisms( Smooth );
CategoryOfParametrisedMorphisms( SkeletalSmoothMaps )
gap> f_smooth := PreCompose( Smooth,
>         DirectProductFunctorial( Smooth, [ Smooth.Sin ^ 2, Smooth.Log ^ 2 ] ),
>         Smooth.Sum( 2 ) );
ℝ^2 -> ℝ^1
gap> dummy_input := CreateContextualVariables( [ "theta_1", "theta_2" ] );
[ theta_1, theta_2 ]
gap> Display( f_smooth : dummy_input := dummy_input );
ℝ^2 -> ℝ^1

‣ Sin( theta_1 ) * Sin( theta_1 ) + Log( theta_2 ) * Log( theta_2 )
gap> f := MorphismConstructor( Para,
>         ObjectConstructor( Para, Smooth.( 0 ) ),
>         Pair( Smooth.( 2 ), f_smooth ),
>         ObjectConstructor( Para, Smooth.( 1 ) ) );
ℝ^0 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^2

Underlying Morphism:
-------------------
ℝ^2 -> ℝ^1
gap> Display( f : dummy_input := dummy_input );
ℝ^0 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^2

Underlying Morphism:
-------------------
ℝ^2 -> ℝ^1

‣ Sin( theta_1 ) * Sin( theta_1 ) + Log( theta_2 ) * Log( theta_2 )
gap> optimizer := Lenses.AdamOptimizer( );
function( n ) ... end
gap> training_examples := [ [ ] ];
[ [ ] ]
gap> batch_size := 1;
1
gap> one_epoch_update := OneEpochUpdateLens( f, optimizer, training_examples, batch_size );
(ℝ^7, ℝ^7) -> (ℝ^1, ℝ^0) defined by:

Get Morphism:
------------
ℝ^7 -> ℝ^1

Put Morphism:
------------
ℝ^7 -> ℝ^7
gap> dummy_input := CreateContextualVariables(
>       [ "t", "m_1", "m_2", "v_1", "v_2", "theta_1", "theta_2" ] );
[ t, m_1, m_2, v_1, v_2, theta_1, theta_2 ]
gap> Display( one_epoch_update : dummy_input := dummy_input );
(ℝ^7, ℝ^7) -> (ℝ^1, ℝ^0) defined by:

Get Morphism:
------------
ℝ^7 -> ℝ^1

‣ (Sin( theta_1 ) * Sin( theta_1 ) + Log( theta_2 ) * Log( theta_2 )) / 1 / 1

Put Morphism:
------------
ℝ^7 -> ℝ^7

‣ t + 1
‣ 0.9 * m_1 + 0.1 * (-1 * ((1 * ((1 * (Sin( theta_1 ) * Cos( theta_1 ) + Sin( theta_1 ) * Cos( theta_1 )) + 0) * 1 + 0) * 1 + 0) * 1 + 0))
‣ 0.9 * m_2 + 0.1 * (-1 * (0 + (0 + 1 * (0 + (0 + 1 * (Log( theta_2 ) * (1 / theta_2) + Log( theta_2 ) * (1 / theta_2))) * 1) * 1) * 1))
‣ 0.999 * v_1 + 0.001 * (-1 * ((1 * ((1 * (Sin( theta_1 ) * Cos( theta_1 ) + Sin( theta_1 ) * Cos( theta_1 )) + 0) * 1 + 0) * 1 + 0) * 1 + 0)) ^ 2
‣ 0.999 * v_2 + 0.001 * (-1 * (0 + (0 + 1 * (0 + (0 + 1 * (Log( theta_2 ) * (1 / theta_2) + Log( theta_2 ) * (1 / theta_2))) * 1) * 1) * 1)) ^ 2
‣ theta_1 + 0.001 / (1 - 0.999 ^ t) * ((0.9 * m_1 + 0.1 * (-1 * ((1 * ((1 * (Sin( theta_1 ) * Cos( theta_1 ) + Sin( theta_1 ) * Cos( theta_1 )) + 0) * 1 + 0) * 1 + 0) * 1 + 0))) / (1.e-0\
7 + Sqrt( (0.999 * v_1 + 0.001 * (-1 * ((1 * ((1 * (Sin( theta_1 ) * Cos( theta_1 ) + Sin( theta_1 ) * Cos( theta_1 )) + 0) * 1 + 0) * 1 + 0) * 1 + 0)) ^ 2) / (1 - 0.999 ^ t) )))
‣ theta_2 + 0.001 / (1 - 0.999 ^ t) * ((0.9 * m_2 + 0.1 * (-1 * (0 + (0 + 1 * (0 + (0 + 1 * (Log( theta_2 ) * (1 / theta_2) + Log( theta_2 ) * (1 / theta_2))) * 1) * 1) * 1))) / (1.e-07 \
+ Sqrt( (0.999 * v_2 + 0.001 * (-1 * (0 + (0 + 1 * (0 + (0 + 1 * (Log( theta_2 ) * (1 / theta_2) + Log( theta_2 ) * (1 / theta_2))) * 1) * 1) * 1)) ^ 2) / (1 - 0.999 ^ t) )))
gap> w := [ 1, 0, 0, 0, 0, 1.58, 0.1 ];
[ 1, 0, 0, 0, 0, 1.58, 0.1 ]
gap> nr_epochs := 500;
500
gap> w := Fit( one_epoch_update, nr_epochs, w : verbose := false );
[ 501, -9.35215e-12, 0.041779, 0.00821802, 1.5526, 3.14159, 0.980292 ]
gap> theta := w{ [ 6, 7 ] };
[ 3.14159, 0.980292 ]
gap> Map( f_smooth )( theta );
[ 0.000396202 ]

2 Examples

2.1 Binary-Class Neural Network with Binary Cross-Entropy Loss Function

2.2 Multi-Class Neural Network with Cross-Entropy Loss Function

2.3 Neural Network with Quadratic Loss Function

2.4 Next Local Minima