This example demonstrates how to train a small feed-forward neural network for a binary classification task using the \texttt{GradientBasedLearningForCAP} package. We use the binary cross-entropy loss and optimise the network parameters with gradient descent.
The dataset consists of points (x_1, x_2) \in \mathbb{R}^2 labelled by a non-linear decision rule describing two regions that form \emph{class 0}:
All remaining points belong to \emph{class 1}. Hence the classification boundary is not linearly separable and requires a non-linear model. We build a neural network with three hidden layers and a sigmoid output, fit it on the provided training examples for several epochs, and then evaluate the trained model on a grid of input points to visualise the learned decision regions.
gap> Smooth := SkeletalSmoothMaps; SkeletalSmoothMaps gap> Lenses := CategoryOfLenses( Smooth ); CategoryOfLenses( SkeletalSmoothMaps ) gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> hidden_layers := [ 6, 6, 6 ];; gap> f := NeuralNetworkLossMorphism( Para, 2, hidden_layers, 1, "Sigmoid" );; gap> optimizer := Lenses.GradientDescentOptimizer( : learning_rate := 0.01 ); function( n ) ... end gap> training_examples_path := Filename( > DirectoriesPackageLibrary("GradientBasedLearningForCAP", "examples")[1], > "NeuralNetwork_BinaryCrossEntropy/data/training_examples.txt" );; gap> batch_size := 2; 2 gap> one_epoch_update := OneEpochUpdateLens( f, optimizer, > training_examples_path, batch_size ); (ℝ^109, ℝ^109) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^109 -> ℝ^1 Put Morphism: ------------ ℝ^109 -> ℝ^109 gap> nr_weights := RankOfObject( Source( PutMorphism( one_epoch_update ) ) ); 109 gap> rs := RandomSource( IsMersenneTwister, 1 );; gap> w := List( [ 1 .. nr_weights ], i -> 0.001 * Random( rs, [ -1000 .. 1000 ] ) );; gap> w{[ 1 .. 5 ]}; [ 0.789, -0.767, -0.613, -0.542, 0.301 ] gap> nr_epochs := 25; 25 gap> w := Fit( one_epoch_update, nr_epochs, w : verbose := true );; Epoch 0/25 - loss = 0.6274786697292678 Epoch 1/25 - loss = 0.50764552556010512 Epoch 2/25 - loss = 0.46701509497218296 Epoch 3/25 - loss = 0.43998434603387304 Epoch 4/25 - loss = 0.41390897205434185 Epoch 5/25 - loss = 0.38668229524419645 Epoch 6/25 - loss = 0.3615103023137366 Epoch 7/25 - loss = 0.33852687543477167 Epoch 8/25 - loss = 0.31713408584173464 Epoch 9/25 - loss = 0.29842876608165969 Epoch 10/25 - loss = 0.28310739567373933 Epoch 11/25 - loss = 0.26735508537538627 Epoch 12/25 - loss = 0.25227135017462571 Epoch 13/25 - loss = 0.23858070423434527 Epoch 14/25 - loss = 0.22557724727481232 Epoch 15/25 - loss = 0.2151923109202202 Epoch 16/25 - loss = 0.20589044111812799 Epoch 17/25 - loss = 0.19857151366814263 Epoch 18/25 - loss = 0.19229381748983518 Epoch 19/25 - loss = 0.18814544378812006 Epoch 20/25 - loss = 0.18465371077598913 Epoch 21/25 - loss = 0.18166012790192537 Epoch 22/25 - loss = 0.17685616213693178 Epoch 23/25 - loss = 0.17665872918251943 Epoch 24/25 - loss = 0.17073585936950184 Epoch 25/25 - loss = 0.16744783175344116 gap> w; [ 1.47751, -0.285187, -1.87358, -1.87839, 0.687266, -0.88329, -0.607225, 0.57876, 0.084489, 1.1218, 0.289778, -1.15844, 0.562299, -0.725222, 0.724775, 0.643942, 0.202536, 0.131565, 0.768751, -0.345379, -0.147853, -1.52103, -1.26183, 1.39931, 0.00143737, -0.819752, -0.90015, -0.534457, 0.74204, -0.768, -1.85381, 0.225274, -0.384199, 1.1034, 0.82565, 0.423966, 0.719847, 0.487972, 0.266537, -0.442324, 0.520839, 0.306871, -0.205834, -0.314044, 0.0395323, -0.489954, -0.368816, 0.305383, -0.181872, 0.775344, -0.57507, -0.792, -0.937068, 1.39995, -0.0236236, 0.370827, -0.778542, -0.783943, 0.034, 0.343554, -1.00419, 0.857391, -1.07632, -0.677147, 0.839605, 0.719, 1.40418, -0.221851, 1.29824, 0.510027, 0.217811, 0.344086, 0.579, 0.576412, 0.070248, -0.145523, 0.468713, 0.680618, 0.199966, -0.497, -0.408801, 0.0519444, -0.597412, 0.137205, 1.25696, -0.0884903, -0.252, -0.721624, -1.25962, 0.894349, 0.447327, -1.00492, -1.54383, 0.464574, -0.723211, -0.108064, -0.486439, -0.385, -0.484, -0.862, -0.121845, 1.0856, 1.09068, 1.69466, 0.938733, 0.529301, -0.465345, 1.23872, 1.07609 ] gap> predict := NeuralNetworkPredictionMorphism( Para, 2, hidden_layers, 1, "Sigmoid" ); ℝ^2 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^109 Underlying Morphism: ------------------- ℝ^111 -> ℝ^1 gap> predict_given_w := ReparametriseMorphism( predict, Smooth.Constant( w ) ); ℝ^2 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^0 Underlying Morphism: ------------------- ℝ^2 -> ℝ^1 gap> predict_using_w := UnderlyingMorphism( predict_given_w ); ℝ^2 -> ℝ^1 gap> inputs := Cartesian( 0.1 * [ -10 .. 10 ], 0.1 * [ -10 .. 10 ] );; gap> predictions := List( inputs, x -> > SelectBasedOnCondition( predict_using_w( x )[1] > 0.5, 1, 0 ) );; gap> # ScatterPlotUsingPython( inputs, predictions );
Executing the command \texttt{ScatterPlotUsingPython( inputs, predictions );} produces the following plot:
This example demonstrates how to train a small feed-forward neural network for a multi-class classification task using the \texttt{GradientBasedLearningForCAP} package. We employ the cross-entropy loss function and optimise the network parameters with gradient descent.
The dataset consists of points (x_1, x_2) \in \mathbb{R}^2 labelled by a non-linear decision rule describing three regions that form We build a neural network with three hidden layers and a Softmax output, fit it on the provided training examples for several epochs, and then evaluate the trained model on a grid of input points to visualise the learned decision regions.
gap> Smooth := SkeletalSmoothMaps; SkeletalSmoothMaps gap> Lenses := CategoryOfLenses( Smooth ); CategoryOfLenses( SkeletalSmoothMaps ) gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> hidden_layers := [ 6, 6, 6 ];; gap> f := NeuralNetworkLossMorphism( Para, 2, hidden_layers, 3, "Softmax" );; gap> optimizer := Lenses.GradientDescentOptimizer( : learning_rate := 0.1 ); function( n ) ... end gap> training_examples_path := Filename( > DirectoriesPackageLibrary("GradientBasedLearningForCAP", "examples")[1], > "NeuralNetwork_CrossEntropy/data/training_examples.txt" );; gap> batch_size := 4; 4 gap> one_epoch_update := OneEpochUpdateLens( f, optimizer, > training_examples_path, batch_size ); (ℝ^123, ℝ^123) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^123 -> ℝ^1 Put Morphism: ------------ ℝ^123 -> ℝ^123 gap> nr_weights := RankOfObject( Source( PutMorphism( one_epoch_update ) ) ); 123 gap> rs := RandomSource( IsMersenneTwister, 1 );; gap> w := List( [ 1 .. nr_weights ], i -> 0.001 * Random( rs, [ -1000 .. 1000 ] ) );; gap> Display( w{[ 1 .. 5 ]} ); [ 0.789, -0.767, -0.613, -0.542, 0.301 ] gap> nr_epochs := 16; 16 gap> w := Fit( one_epoch_update, nr_epochs, w : verbose := true );; Epoch 0/16 - loss = 0.80405334335407785 Epoch 1/16 - loss = 0.18338542093217905 Epoch 2/16 - loss = 0.1491650040794873 Epoch 3/16 - loss = 0.13186409729963983 Epoch 4/16 - loss = 0.12293129048146505 Epoch 5/16 - loss = 0.11742704538825839 Epoch 6/16 - loss = 0.11191588532335346 Epoch 7/16 - loss = 0.10441947487056685 Epoch 8/16 - loss = 0.095102838431592687 Epoch 9/16 - loss = 0.092441708967385072 Epoch 10/16 - loss = 0.097057579505470393 Epoch 11/16 - loss = 0.093295953606638768 Epoch 12/16 - loss = 0.082114375099200984 Epoch 13/16 - loss = 0.082910416530212819 Epoch 14/16 - loss = 0.082815082271383303 Epoch 15/16 - loss = 0.085405485529683856 Epoch 16/16 - loss = 0.087825108242740729 gap> w; [ 0.789, -1.09294, -1.43008, -0.66714, 1.27126, -1.12774, -0.240397, 0.213, -0.382376, 1.42204, 0.300837, -1.79451, 0.392967, -0.868913, 0.858, 1.16231, 0.769031, 0.309303, 0.555253, -0.142223, 0.0703106, -0.997, -0.746, 0.9, -0.248, -0.801, -0.317, -0.826, 0.0491083, -1.51073, -1.01246, 0.371752, -0.852, 0.342548, 1.01666, 1.39005, 0.958034, 0.357176, 0.3225, -0.29, -1.0095, 0.154876, -0.460859, -0.582425, 0.223943, -0.402, -0.368, 0.275911, -0.0791975, 0.0986371, -0.487903, -0.699542, -0.553485, 0.766, 1.88163, 0.903741, -0.895688, -0.949546, 0.034, 0.13, -0.91, 0.67043, -0.784672, -0.195688, 1.49813, 0.881451, 0.679593, -0.380004, 0.743062, 0.529804, 0.221497, 0.487694, 1.12092, 1.38134, -0.313891, 0.780071, 0.00526383, 0.422997, 0.287254, -0.42555, -0.0525988, -0.159442, -0.256285, -0.296361, 0.822117, -0.23663, -0.252, -0.986452, -0.955211, 0.52727, 0.261295, -0.867, -0.787, -0.395, -0.871, -0.205, -0.315, -0.385, -0.292919, -1.46115, -0.634953, 0.818446, 0.903525, 0.833456, 1.59504, -0.500531, -0.191608, 0.390861, 0.808496, -1.94883, 0.445591, -1.62511, -0.601054, -0.154008, -1.20266, -0.255521, 0.989522, 0.29963, 0.372084, 1.07529, -0.909025, 0.454265, 0.539106 ] gap> predict := NeuralNetworkPredictionMorphism( Para, 2, hidden_layers, 3, "Softmax" ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^123 Underlying Morphism: ------------------- ℝ^125 -> ℝ^3 gap> predict_given_w := ReparametriseMorphism( predict, Smooth.Constant( w ) ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^0 Underlying Morphism: ------------------- ℝ^2 -> ℝ^3 gap> predict_using_w := UnderlyingMorphism( predict_given_w ); ℝ^2 -> ℝ^3 gap> inputs := Cartesian( 0.1 * [ -10 .. 10 ], 0.1 * [ -10 .. 10 ] );; gap> predictions := List( inputs, x -> > -1 + Position( predict_using_w( x ), Maximum( predict_using_w( x ) ) ) );; gap> # ScatterPlotUsingPython( inputs, predictions );
Executing the command \texttt{ScatterPlotUsingPython( inputs, predictions );} produces the following plot:
This example demonstrates how to train a small feed-forward neural network for a regression task using the \texttt{GradientBasedLearningForCAP} package. We employ the quadratic loss function and optimise the network parameters with gradient descent. The dataset consists of points (x_1, x_2) \in \mathbb{R}^2 with corresponding outputs y \in \mathbb{R} generated by a linear function with some added noise. Concretely, the outputs are generated according to the formula We build a neural network with input dimension 2, no hidden layers, and output dimension 1. Hence, the affine map between input and output layer has the following matrix dimensions (together with bias vector): Where W_1 \in \mathbb{R}^{2 \times 1} and b_1 \in \mathbb{R}^1 are the weights and bias to be learned. Equivalently, the network computes for an input a_0 \in \mathbb{R}^2 the output Hence, the number of parameters to learn is 3 (two weights and one bias). We fit the neural network on the provided training examples for 30 epochs, and then compare the learned parameters to the perfect weights used to generate the dataset. We use the Adam optimiser for gradient descent. Hence, the initiat weights vector (t, m_1, m_2, m_3, v_1, v_2, v_3, w_1, w_2, b_1) \in \mathbb{R}^{1+3+3+3} contains additional parameters for the optimiser (the m's and v's). We initialise t to 1 and m's and v's to 0.
gap> Smooth := SkeletalSmoothMaps; SkeletalSmoothMaps gap> Lenses := CategoryOfLenses( Smooth ); CategoryOfLenses( SkeletalSmoothMaps ) gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> f := NeuralNetworkLossMorphism( Para, 2, [ ], 1, "IdFunc" ); ℝ^3 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^3 Underlying Morphism: ------------------- ℝ^6 -> ℝ^1 gap> optimizer := Lenses.AdamOptimizer(); function( n ) ... end gap> training_examples_path := Filename( > DirectoriesPackageLibrary("GradientBasedLearningForCAP", "examples")[1], > "NeuralNetwork_QuadraticLoss/data/training_examples.txt" );; gap> batch_size := 5; 5 gap> one_epoch_update := OneEpochUpdateLens( f, optimizer, > training_examples_path, batch_size ); (ℝ^10, ℝ^10) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^10 -> ℝ^1 Put Morphism: ------------ ℝ^10 -> ℝ^10 gap> w := [ 1, 0, 0, 0, 0, 0, 0, 0.21, -0.31, 0.7 ]; [ 1, 0, 0, 0, 0, 0, 0, 0.21, -0.31, 0.7 ] gap> nr_epochs := 30; 30 gap> w := Fit( one_epoch_update, nr_epochs, w );; Epoch 0/30 - loss = 4.4574869198 Epoch 1/30 - loss = 1.0904439656285798 Epoch 2/30 - loss = 0.44893422753741707 Epoch 3/30 - loss = 0.24718222552679428 Epoch 4/30 - loss = 0.15816538314892969 Epoch 5/30 - loss = 0.11009214898573197 Epoch 6/30 - loss = 0.080765189573546586 Epoch 7/30 - loss = 0.061445427900729599 Epoch 8/30 - loss = 0.04803609207319106 Epoch 9/30 - loss = 0.038370239087861441 Epoch 10/30 - loss = 0.031199992288917108 Epoch 11/30 - loss = 0.025760084031019172 Epoch 12/30 - loss = 0.021557800050973547 Epoch 13/30 - loss = 0.018263315597330656 Epoch 14/30 - loss = 0.01564869258749324 Epoch 15/30 - loss = 0.013552162640841157 Epoch 16/30 - loss = 0.011856309185255345 Epoch 17/30 - loss = 0.010474254262187581 Epoch 18/30 - loss = 0.0093406409193010267 Epoch 19/30 - loss = 0.008405587711401704 Epoch 20/30 - loss = 0.0076305403249797375 Epoch 21/30 - loss = 0.0069853659369945552 Epoch 22/30 - loss = 0.0064462805409909937 Epoch 23/30 - loss = 0.0059943461353685126 Epoch 24/30 - loss = 0.0056143650058947617 Epoch 25/30 - loss = 0.0052940553411779294 Epoch 26/30 - loss = 0.0050234291867088457 Epoch 27/30 - loss = 0.0047943179297568897 Epoch 28/30 - loss = 0.0046000067074985669 Epoch 29/30 - loss = 0.004434950161766555 Epoch 30/30 - loss = 0.0042945495896027528 gap> w; [ 601, -0.00814765, -0.0328203, 0.00154532, 0.0208156, 0.0756998, 0.047054, 2.01399, -2.9546, 0.989903 ]
We notice that the learned weights w_1 \approx 2.01399, w_2 \approx -2.9546, and b_1 \approx 0.989903 are close to the perfect weights 2, -3, and 1 used to generate the dataset.
In this example we demonstrate how to use the fitting machinery of \texttt{GradientBasedLearningForCAP} to find a nearby local minimum of a smooth function by gradient-based optimisation.
We consider the function which has local minima at the points (\pi k, 1) for k \in \mathbb{Z}. We use the Adam optimiser to find a local minimum starting from an initial point. Hence, the parameter vector is of the form where t is the time step, m_1 and m_2 are the first moment estimates for \theta_1 and \theta_2 respectively, and v_1 and v_2 are the second moment estimates for \theta_1 and \theta_2 respectively. We start from the initial point which is close to the local minimum at (\pi, 1). After running the optimisation for 500 epochs, we reach the point where the last two components correspond to the parameters \theta_1 and \theta_2. Evaluating the function f at this point gives us the value which is very close to 0, the value of the function at the local minima. Thus, we have successfully found a local minimum using gradient-based optimisation. Note that during the optimisation process, the \theta_1 parameter moved from approximately 1.58 to approximately \pi, while the \theta_2 parameter moved from 0.1 to approximately 1.
gap> Smooth := SkeletalCategoryOfSmoothMaps( ); SkeletalSmoothMaps gap> Lenses := CategoryOfLenses( Smooth ); CategoryOfLenses( SkeletalSmoothMaps ) gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> f_smooth := PreCompose( Smooth, > DirectProductFunctorial( Smooth, [ Smooth.Sin ^ 2, Smooth.Log ^ 2 ] ), > Smooth.Sum( 2 ) ); ℝ^2 -> ℝ^1 gap> dummy_input := CreateContextualVariables( [ "theta_1", "theta_2" ] ); [ theta_1, theta_2 ] gap> Display( f_smooth : dummy_input := dummy_input ); ℝ^2 -> ℝ^1 ‣ Sin( theta_1 ) * Sin( theta_1 ) + Log( theta_2 ) * Log( theta_2 ) gap> f := MorphismConstructor( Para, > ObjectConstructor( Para, Smooth.( 0 ) ), > Pair( Smooth.( 2 ), f_smooth ), > ObjectConstructor( Para, Smooth.( 1 ) ) ); ℝ^0 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^2 -> ℝ^1 gap> Display( f : dummy_input := dummy_input ); ℝ^0 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^2 -> ℝ^1 ‣ Sin( theta_1 ) * Sin( theta_1 ) + Log( theta_2 ) * Log( theta_2 ) gap> optimizer := Lenses.AdamOptimizer( ); function( n ) ... end gap> training_examples := [ [ ] ]; [ [ ] ] gap> batch_size := 1; 1 gap> one_epoch_update := OneEpochUpdateLens( f, optimizer, training_examples, batch_size ); (ℝ^7, ℝ^7) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^7 -> ℝ^1 Put Morphism: ------------ ℝ^7 -> ℝ^7 gap> dummy_input := CreateContextualVariables( > [ "t", "m_1", "m_2", "v_1", "v_2", "theta_1", "theta_2" ] ); [ t, m_1, m_2, v_1, v_2, theta_1, theta_2 ] gap> Display( one_epoch_update : dummy_input := dummy_input ); (ℝ^7, ℝ^7) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^7 -> ℝ^1 ‣ (Sin( theta_1 ) * Sin( theta_1 ) + Log( theta_2 ) * Log( theta_2 )) / 1 / 1 Put Morphism: ------------ ℝ^7 -> ℝ^7 ‣ t + 1 ‣ 0.9 * m_1 + 0.1 * (-1 * ((1 * ((1 * (Sin( theta_1 ) * Cos( theta_1 ) + Sin( theta_1 ) * Cos( theta_1 )) + 0) * 1 + 0) * 1 + 0) * 1 + 0)) ‣ 0.9 * m_2 + 0.1 * (-1 * (0 + (0 + 1 * (0 + (0 + 1 * (Log( theta_2 ) * (1 / theta_2) + Log( theta_2 ) * (1 / theta_2))) * 1) * 1) * 1)) ‣ 0.999 * v_1 + 0.001 * (-1 * ((1 * ((1 * (Sin( theta_1 ) * Cos( theta_1 ) + Sin( theta_1 ) * Cos( theta_1 )) + 0) * 1 + 0) * 1 + 0) * 1 + 0)) ^ 2 ‣ 0.999 * v_2 + 0.001 * (-1 * (0 + (0 + 1 * (0 + (0 + 1 * (Log( theta_2 ) * (1 / theta_2) + Log( theta_2 ) * (1 / theta_2))) * 1) * 1) * 1)) ^ 2 ‣ theta_1 + 0.001 / (1 - 0.999 ^ t) * ((0.9 * m_1 + 0.1 * (-1 * ((1 * ((1 * (Sin( theta_1 ) * Cos( theta_1 ) + Sin( theta_1 ) * Cos( theta_1 )) + 0) * 1 + 0) * 1 + 0) * 1 + 0))) / (1.e-0\ 7 + Sqrt( (0.999 * v_1 + 0.001 * (-1 * ((1 * ((1 * (Sin( theta_1 ) * Cos( theta_1 ) + Sin( theta_1 ) * Cos( theta_1 )) + 0) * 1 + 0) * 1 + 0) * 1 + 0)) ^ 2) / (1 - 0.999 ^ t) ))) ‣ theta_2 + 0.001 / (1 - 0.999 ^ t) * ((0.9 * m_2 + 0.1 * (-1 * (0 + (0 + 1 * (0 + (0 + 1 * (Log( theta_2 ) * (1 / theta_2) + Log( theta_2 ) * (1 / theta_2))) * 1) * 1) * 1))) / (1.e-07 \ + Sqrt( (0.999 * v_2 + 0.001 * (-1 * (0 + (0 + 1 * (0 + (0 + 1 * (Log( theta_2 ) * (1 / theta_2) + Log( theta_2 ) * (1 / theta_2))) * 1) * 1) * 1)) ^ 2) / (1 - 0.999 ^ t) ))) gap> w := [ 1, 0, 0, 0, 0, 1.58, 0.1 ]; [ 1, 0, 0, 0, 0, 1.58, 0.1 ] gap> nr_epochs := 500; 500 gap> w := Fit( one_epoch_update, nr_epochs, w : verbose := false ); [ 501, -9.35215e-12, 0.041779, 0.00821802, 1.5526, 3.14159, 0.980292 ] gap> theta := w{ [ 6, 7 ] }; [ 3.14159, 0.980292 ] gap> Map( f_smooth )( theta ); [ 0.000396202 ]
generated by GAPDoc2HTML