Suppose we have a parametrised morphism (\mathbb{R}^p, f):\mathbb{R}^n \to \mathbb{R} where \mathbb{R}^p is the parameters of the morphism and f:\mathbb{R}^{p+n} \to \mathbb{R} is a morphism in a skeletal category of smooth maps (It represents a loss function over an input in \mathbb{R}^n and parameter vector in \mathbb{R}^p). Given a set of training examples \{X_1, \ldots, X_m\} where each X_i \in \mathbb{R}^n, we want to fit a parameter vector \Theta \in \mathbb{R}^p such that the output of f is minimized on the training examples.
We can achieve this by creating an update-lens for each training example. This update-lens reads the current parameters \Theta and updates it according to the gradient of the loss function f at the example X_i. We start by substituting the training example X_i into f resulting in a morphism f_i:\mathbb{R}^p \to \mathbb{R} defined by f_i(\Theta) = f(\Theta, X_i). By applying the reverse differential lens functor ReverseDifferentialLensFunctor
\mathbf{R}: \mathrm{Smooth} \to \mathrm{Lenses}(\mathrm{Smooth}),
on f_i, we obtain a lens \mathbf{R}(f_i):(\mathbb{R}^p, \mathbb{R}^p) \to (\mathbb{R}^1, \mathbb{R}^1). The get-morphism of this lens reads the current parameters \Theta and computes the loss f_i(\Theta), while the put-morphism Rf_i:\mathbb{R}^p \times \mathbb{R}^1 \to \mathbb{R}^p is given by (\Theta, r) \mapsto rJ_{f_i}(\Theta) where J_{f_i}(\Theta) \in \mathbb{R}^{1 \times p} is the Jacobian matrix of f_i evaluated at \Theta.
The One-Epoch update lens for the example X_i is then obtained by precomposing an optimizer lens (e.g., gradient descent, Adam, etc.) to the following lens \mathbf{R}(f_i) \cdot \varepsilon where \varepsilon:(\mathbb{R}^1, \mathbb{R}^0) \to (\mathbb{R}^1, \mathbb{R}^1) is the lens defined by:
Get morphism: the identity morphism on \mathbb{R}^1.
Put morphism: the morphism \mathbb{R}^1 \to \mathbb{R}^0 defined by r \mapsto -r.
This lens merely negates the gradient signal.
Suppose we choose the optimizer lens to be the gradient descent optimizer with learning rate \eta = 0.01 > 0, then the resulting One-Epoch update lens for the example X_i is given by Now, we can start by a random parameter vector \Theta_0 \in \mathbb{R}^p and apply the update morphism of the One-Epoch update lens for X_1 to obtain a new parameter vector \Theta_1, then use \Theta_1 and the One-Epoch update lens for X_2 to obtain \Theta_2, and so on. After going through all training examples, we have completed one epoch of training. To perform multiple epochs of training, we can simply repeat the process.
For example, suppose we start with the parmetised morphism (\mathbb{R}^2, f):\mathbb{R}^2 \to \mathbb{R} where f:\mathbb{R}^{2+2} \to \mathbb{R} is defined by f(\theta_1, \theta_2, x_1, x_2) = (x_1-\theta_1)^2 + (x_2-\theta_2)^2 where \Theta := (\theta_1, \theta_2) \in \mathbb{R}^2 represents the parameters and x = (x_1, x_2) \in \mathbb{R}^2 is the input. Given training examples X_1 = (1,2) and X_2 = (3,4), the morphism f_1:\mathbb{R}^2 \to \mathbb{R} is defined by f_1(\theta_1, \theta_2) = (1 - \theta_1)^2 + (2 - \theta_2)^2 with Jacobian matrix Thus, the One-Epoch update lens for X_1 is given by: and the One-Epoch update lens for X_2 is given by: Suppose we start with the parameter vector \Theta = (0,0). Then:
After applying the update lens for X_1: \Theta_1 = (0.98 \cdot 0 + 0.02, 0.98 \cdot 0 + 0.04) = (0.02, 0.04).
After applying the update lens for X_2: \Theta_2 = (0.98 \cdot 0.02 + 0.06, 0.98 \cdot 0.04 + 0.08) = (0.0796, 0.1192).
Thus, after one epoch of training, the updated parameters are \Theta_2 = (0.0796, 0.1192). Repeating this process for multiple epochs will further refine the parameters to minimize the loss function over the training examples. Eventually, we expect the parameters to converge to \Theta = [2, 3] which minimizes the loss function. The point whose distance from [1, 2] and [3, 4] is minimized is [2, 3]. See the examples section for the implementation of this process in GAP.
Given a parametrised (loss) morphism (\mathbb{R}^p, f):\mathbb{R}^n \to \mathbb{R} and a set of training examples \{X_1, \ldots, X_m\} where each X_i \in \mathbb{R}^n. If the number of training examples m is large, it may be beneficial to use mini-batches during training. Given a positive integer batch_size, the loss morphism is first batched using Batchify. This means, we create a new parametrised morphism (\mathbb{R}^p, f_{batch}):\mathbb{R}^{batch\_size \cdot n} \to \mathbb{R} where f_{batch}(\Theta, X_{i_1}, \ldots, X_{i_{batch\_size}}) = \frac{1}{batch\_size} \sum_{j=1}^{batch\_size} f(\Theta, X_{i_j}). We divide the training examples into mini-batches of size batch_size (padding the list by repeating randomly chosen examples if necessary to make its length divisible by batch_size). And then we consider each mini-batch as a single training example. Now, we can repeat the training process described above using the batched loss morphism and the new training examples. For example, if the parametrised morphism is (\mathbb{R}^p, f):\mathbb{R}^2 \to \mathbb{R} where f(\theta_1, \theta_2, x_1, x_2) = (x_1-\theta_1)^2 + (x_2-\theta_2)^2, and we have training examples [[1,2], [3,4], [5,6], [7,8], [9,10]], then for batch_size = 2, the batched loss morphism is (\mathbb{R}^p, f_{batch}):\mathbb{R}^4 \to \mathbb{R} where f_{batch}(\theta_1, \theta_2, x_1, x_2, x_3, x_4) = \frac{1}{2} \left( (x_1-\theta_1)^2 + (x_2-\theta_2)^2 + (x_3-\theta_1)^2 + (x_4-\theta_2)^2 \right) (See Batchify operation). Since the number of training examples is not divisible by batch_size, we pad the list by randomly choosing an example (say, [1,2]) and appending it to the list. Then the new training examples set would be [[1,2,3,4], [5,6,7,8], [9,10,1,2]].
‣ OneEpochUpdateLens( parametrised_morphism, optimizer, training_examples, batch_size ) | ( operation ) |
Returns: a morphism in a category of lenses (the epoch update lens)
Create an update lens for one epoch of training.
The argument parametrised_morphism must be a morphism in a category of parametrised morphisms whose target has rank 1 (a scalar loss).
The argument optimizer is a function which takes the number of parameters p and returns an optimizer lens in the category of lenses over Smooth. Typical examples are Lenses.GradientDescentOptimizer, Lenses.AdamOptimizer, etc.
The list training_examples must contain at least one example; each example is a dense list representing a vector in \mathbb{R}^n.
‣ OneEpochUpdateLens( parametrised_morphism, optimizer, training_examples_path, batch_size ) | ( operation ) |
Returns: a morphism in a category of lenses (the epoch update lens)
Same as OneEpochUpdateLens, but reads the training examples from a file. The file is evaluated using EvalString and is expected to contain a GAP expression evaluating to a dense list of examples.
‣ Fit( one_epoch_update_lens, nr_epochs, initial_weights ) | ( operation ) |
Returns: a list of final weights
Perform nr_epochs epochs of training using the given one_epoch_update_lens and initial weights initial_weights.
The lens one_epoch_update_lens must have get-morphism \mathbb{R}^p \to \mathbb{R}^1 and put-morphism \mathbb{R}^p \to \mathbb{R}^p for the same p as the length of initial_weights. The option verbose controls whether to print the loss at each epoch.
gap> Smooth := SkeletalCategoryOfSmoothMaps( ); SkeletalSmoothMaps gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> Lenses := CategoryOfLenses( Smooth ); CategoryOfLenses( SkeletalSmoothMaps ) gap> D := [ Smooth.1, Smooth.1, Smooth.1, Smooth.1 ]; [ ℝ^1, ℝ^1, ℝ^1, ℝ^1 ] gap> p1 := ProjectionInFactorOfDirectProduct( Smooth, D, 1 ); ℝ^4 -> ℝ^1 gap> p2 := ProjectionInFactorOfDirectProduct( Smooth, D, 2 ); ℝ^4 -> ℝ^1 gap> p3 := ProjectionInFactorOfDirectProduct( Smooth, D, 3 ); ℝ^4 -> ℝ^1 gap> p4 := ProjectionInFactorOfDirectProduct( Smooth, D, 4 ); ℝ^4 -> ℝ^1 gap> f := PreCompose( (p3 - p1), Smooth.Power(2) ) > + PreCompose( (p4 - p2), Smooth.Power(2) ); ℝ^4 -> ℝ^1 gap> dummy_input := CreateContextualVariables( [ "theta_1", "theta_2", "x1", "x2" ] ); [ theta_1, theta_2, x1, x2 ] gap> Display( f : dummy_input := dummy_input ); ℝ^4 -> ℝ^1 ‣ (x1 + (- theta_1)) ^ 2 + (x2 + (- theta_2)) ^ 2 gap> f := MorphismConstructor( Para, Para.2, [ Smooth.2, f ], Para.1 ); ℝ^2 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^4 -> ℝ^1 gap> Display( f : dummy_input := dummy_input ); ℝ^2 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^4 -> ℝ^1 ‣ (x1 + (- theta_1)) ^ 2 + (x2 + (- theta_2)) ^ 2 gap> optimizer := Lenses.GradientDescentOptimizer( :learning_rate := 0.01 ); function( n ) ... end gap> dummy_input := CreateContextualVariables( [ "theta_1", "theta_2", "g1", "g2" ] ); [ theta_1, theta_2, g1, g2 ] gap> Display( optimizer( 2 ) : dummy_input := dummy_input ); (ℝ^2, ℝ^2) -> (ℝ^2, ℝ^2) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^2 ‣ theta_1 ‣ theta_2 Put Morphism: ------------ ℝ^4 -> ℝ^2 ‣ theta_1 + 0.01 * g1 ‣ theta_2 + 0.01 * g2 gap> update_lens_1 := OneEpochUpdateLens( f, optimizer, [ [ 1, 2 ] ], 1 ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 Put Morphism: ------------ ℝ^2 -> ℝ^2 gap> dummy_input := CreateContextualVariables( [ "theta_1", "theta_2" ] ); [ theta_1, theta_2 ] gap> Display( update_lens_1 : dummy_input := dummy_input ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 ‣ ((1 + (- theta_1)) ^ 2 + (2 + (- theta_2)) ^ 2) / 1 / 1 Put Morphism: ------------ ℝ^2 -> ℝ^2 ‣ theta_1 + 0.01 * (-1 * (0 + 0 + (1 * ((2 * (1 + (- theta_1)) ^ 1 * -1 + 0) * 1 + 0 + 0 + 0) * 1 + 0 + 0 + 0) * 1 + 0)) ‣ theta_2 + 0.01 * (-1 * (0 + 0 + 0 + (0 + 1 * (0 + (0 + 2 * (2 + (- theta_2)) ^ 1 * -1) * 1 + 0 + 0) * 1 + 0 + 0) * 1)) gap> update_lens_1 := SimplifyMorphism( update_lens_1, infinity ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 Put Morphism: ------------ ℝ^2 -> ℝ^2 gap> Display( update_lens_1 : dummy_input := dummy_input ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 ‣ (theta_1 - 1) ^ 2 + (theta_2 - 2) ^ 2 Put Morphism: ------------ ℝ^2 -> ℝ^2 ‣ 0.98 * theta_1 + 0.02 ‣ 0.98 * theta_2 + 0.04 gap> update_lens_2 := OneEpochUpdateLens( f, optimizer, [ [ 3, 4 ] ], 1 ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 Put Morphism: ------------ ℝ^2 -> ℝ^2 gap> Display( update_lens_2 : dummy_input := dummy_input ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 ‣ ((3 + (- theta_1)) ^ 2 + (4 + (- theta_2)) ^ 2) / 1 / 1 Put Morphism: ------------ ℝ^2 -> ℝ^2 ‣ theta_1 + 0.01 * (-1 * (0 + 0 + (1 * ((2 * (3 + (- theta_1)) ^ 1 * -1 + 0) * 1 + 0 + 0 + 0) * 1 + 0 + 0 + 0) * 1 + 0)) ‣ theta_2 + 0.01 * (-1 * (0 + 0 + 0 + (0 + 1 * (0 + (0 + 2 * (4 + (- theta_2)) ^ 1 * -1) * 1 + 0 + 0) * 1 + 0 + 0) * 1)) gap> update_lens_2 := SimplifyMorphism( update_lens_2, infinity ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 Put Morphism: ------------ ℝ^2 -> ℝ^2 gap> Display( update_lens_2 : dummy_input := dummy_input ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 ‣ (theta_1 - 3) ^ 2 + (theta_2 - 4) ^ 2 Put Morphism: ------------ ℝ^2 -> ℝ^2 ‣ 0.98 * theta_1 + 0.06 ‣ 0.98 * theta_2 + 0.08 gap> update_lens := OneEpochUpdateLens( f, optimizer, [ [ 1, 2 ], [ 3, 4 ] ], 1 ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 Put Morphism: ------------ ℝ^2 -> ℝ^2 gap> Display( update_lens : dummy_input := dummy_input ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 ‣ ( ((1 + (- theta_1)) ^ 2 + (2 + (- theta_2)) ^ 2) / 1 + ((3 + (- theta_1)) ^ 2 + (4 + (- theta_2)) ^ 2) / 1 ) / 2 Put Morphism: ------------ ℝ^2 -> ℝ^2 ‣ theta_1 + 0.01 * (-1 * (0 + 0 + (1 * ((2 * (1 + (- theta_1)) ^ 1 * -1 + 0) * 1 + 0 + 0 + 0) * 1 + 0 + 0 + 0) * 1 + 0)) + 0.01 * (-1 * (0 + 0 + (1 * ((2 * (3 + (- (theta_1 + 0.01 * (-1 * (0 + 0 + (1 * ((2 * (1 + (- theta_1)) ^ 1 * -1 + 0) * 1 + 0 + 0 + 0) * 1 + 0 + 0 + 0) * 1 + 0))))) ^ 1 * -1 + 0) * 1 + 0 + 0 + 0) * 1 + 0 + 0 + 0) * 1 + 0)) ‣ theta_2 + 0.01 * (-1 * (0 + 0 + 0 + (0 + 1 * (0 + (0 + 2 * (2 + (- theta_2)) ^ 1 * -1) * 1 + 0 + 0) * 1 + 0 + 0) * 1)) + 0.01 * (-1 * (0 + 0 + 0 + (0 + 1 * (0 + (0 + 2 * (4 + (- (theta_2 + 0.01 * (-1 * (0 + 0 + 0 + (0 + 1 * (0 + (0 + 2 * (2 + (- theta_2)) ^ 1 * -1) * 1 + 0 + 0) * 1 + 0 + 0) * 1))))) ^ 1 * -1) * 1 + 0 + 0) * 1 + 0 + 0) * 1)) gap> update_lens := SimplifyMorphism( update_lens, infinity ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 Put Morphism: ------------ ℝ^2 -> ℝ^2 gap> Display( update_lens : dummy_input := dummy_input ); (ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^2 -> ℝ^1 ‣ theta_1 ^ 2 - 4 * theta_1 + theta_2 ^ 2 - 6 * theta_2 + 15 Put Morphism: ------------ ℝ^2 -> ℝ^2 ‣ 0.9604 * theta_1 + 0.0796 ‣ 0.9604 * theta_2 + 0.1192 gap> "If we used only update_lens_1, the parameters converge to (1,2)";; gap> theta := [ 0, 0 ];; gap> for i in [ 1 .. 1000 ] do theta := PutMorphism( update_lens_1 )( theta ); od; gap> theta; [ 1., 2. ] gap> "If we used only update_lens_2, the parameters converge to (3,4)";; gap> theta := [ 0, 0 ];; gap> for i in [ 1 .. 1000 ] do theta := PutMorphism( update_lens_2 )( theta ); od; gap> theta; [ 3., 4. ] gap> "If we use the combined update_lens, the parameters converge to (2,3)";; gap> theta := [ 0, 0 ];; gap> for i in [ 1 .. 1000 ] do theta := PutMorphism( update_lens )( theta ); od; gap> theta; [ 2.0101, 3.0101 ] gap> "Inseated of manually applying the put-morphism, we can use the Fit operation:";; gap> "For example, to fit theta = (0,0) using 10 epochs:";; gap> theta := [ 0, 0 ];; gap> theta := Fit( update_lens, 10, theta ); Epoch 0/10 - loss = 15 Epoch 1/10 - loss = 13.9869448 Epoch 2/10 - loss = 13.052687681213568 Epoch 3/10 - loss = 12.19110535502379 Epoch 4/10 - loss = 11.39655013449986 Epoch 5/10 - loss = 10.663813003077919 Epoch 6/10 - loss = 9.9880895506637923 Epoch 7/10 - loss = 9.3649485545394704 Epoch 8/10 - loss = 8.790302999738083 Epoch 9/10 - loss = 8.2603833494932317 Epoch 10/10 - loss = 7.7717128910720641 [ 0.668142, 1.00053 ]
Let us in this example find a solution to the equation \theta^3-\theta^2-4=0. We can reframe this as a minimization problem by considering the parametrised morphism (\mathbb{R}^1, f):\mathbb{R}^0 \to \mathbb{R}^1 where f(\theta) = (\theta^3-\theta^2-4)^2.
gap> Smooth := SkeletalCategoryOfSmoothMaps( ); SkeletalSmoothMaps gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> Lenses := CategoryOfLenses( Smooth ); CategoryOfLenses( SkeletalSmoothMaps ) gap> f := Smooth.Power( 3 ) - Smooth.Power( 2 ) - Smooth.Constant([ 4 ]); ℝ^1 -> ℝ^1 gap> Display( f ); ℝ^1 -> ℝ^1 ‣ x1 ^ 3 + (- x1 ^ 2) + - 4 gap> f := PreCompose( f, Smooth.Power( 2 ) ); ℝ^1 -> ℝ^1 gap> Display( f ); ℝ^1 -> ℝ^1 ‣ (x1 ^ 3 + (- x1 ^ 2) + - 4) ^ 2 gap> f := MorphismConstructor( Para, Para.0, [ Smooth.1, f ], Para.1 ); ℝ^0 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^1 Underlying Morphism: ------------------- ℝ^1 -> ℝ^1 gap> dummy_input := CreateContextualVariables( [ "theta" ] ); [ theta ] gap> Display( f : dummy_input := dummy_input ); ℝ^0 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^1 Underlying Morphism: ------------------- ℝ^1 -> ℝ^1 ‣ (theta ^ 3 + (- theta ^ 2) + -4) ^ 2 gap> optimizer := Lenses.AdamOptimizer( :learning_rate := 0.01, > beta1 := 0.9, beta2 := 0.999, epsilon := 1.e-7 ); function( n ) ... end gap> dummy_input := CreateContextualVariables( [ "t", "m", "v", "theta", "g" ] ); [ t, m, v, theta, g ] gap> Display( optimizer( 1 ) : dummy_input := dummy_input ); (ℝ^4, ℝ^4) -> (ℝ^1, ℝ^1) defined by: Get Morphism: ------------ ℝ^4 -> ℝ^1 ‣ theta Put Morphism: ------------ ℝ^5 -> ℝ^4 ‣ t + 1 ‣ 0.9 * m + 0.1 * g ‣ 0.999 * v + 0.001 * g ^ 2 ‣ theta + 0.01 / (1 - 0.999 ^ t) * ((0.9 * m + 0.1 * g) / (1.e-07 + Sqrt( (0.999 * v + 0.001 * g ^ 2) / (1 - 0.999 ^ t) ))) gap> update_lens := OneEpochUpdateLens( f, optimizer, [ [ ] ], 1 ); (ℝ^4, ℝ^4) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^4 -> ℝ^1 Put Morphism: ------------ ℝ^4 -> ℝ^4 gap> dummy_input := CreateContextualVariables( [ "t", "m", "v", "theta" ] ); [ t, m, v, theta ] gap> Display( update_lens : dummy_input := dummy_input ); (ℝ^4, ℝ^4) -> (ℝ^1, ℝ^0) defined by: Get Morphism: ------------ ℝ^4 -> ℝ^1 ‣ (theta ^ 3 + (- theta ^ 2) + -4) ^ 2 / 1 / 1 Put Morphism: ------------ ℝ^4 -> ℝ^4 ‣ t + 1 ‣ 0.9 * m + 0.1 * (-1 * (1 * (2 * (theta ^ 3 + (- theta ^ 2) + -4) ^ 1 * (3 * theta ^ 2 + (- 2 * theta ^ 1)) * 1) * 1 * 1)) ‣ 0.999 * v + 0.001 * (-1 * (1 * (2 * (theta ^ 3 + (- theta ^ 2) + -4) ^ 1 * (3 * theta ^ 2 + (- 2 * theta ^ 1)) * 1) * 1 * 1)) ^ 2 ‣ theta + 0.01 / (1 - 0.999 ^ t) * ((0.9 * m + 0.1 * (-1 * (1 * (2 * (theta ^ 3 + (- theta ^ 2) + -4) ^ 1 * (3 * theta ^ 2 + (- 2 * theta ^ 1)) * 1) * 1 * 1))) / (1.e-07 + Sqrt( (0.999 * v + 0.001 * (-1 * (1 * (2 * (theta ^ 3 + (- theta ^ 2) + -4) ^ 1 * (3 * theta ^ 2 + (- 2 * theta ^ 1)) * 1) * 1 * 1)) ^ 2) / (1 - 0.999 ^ t) ))) gap> Fit( update_lens, 10000, [ 1, 0, 0, 8 ] : verbose := false ); [ 10001, 4.11498e-13, 1463.45, 2. ] gap> UnderlyingMorphism( f )( [ 2. ] ); [ 0. ]
generated by GAPDoc2HTML