A neural network can be viewed as a composition of parametrised affine transformations and non-linear activation functions (such as ReLU, Sigmoid, Softmax, etc.).
‣ NeuralNetworkLogitsMorphism( Para, s, hidden_layers_dims, t ) | ( operation ) |
Returns: a parametrised morphism
The arguments are Para, a parametrised morphism category, s, a positive integer giving the input dimension, hidden_layers_dims, a list of positive integers giving the sizes of the hidden layers in order, and t, a positive integer giving the output dimension. This operation constructs a parametrised morphism that computes the logits (pre-activation outputs) of a fully-connected feed-forward neural network. The signature of the parametrised morphism is \mathbb{R}^s \to \mathbb{R}^t and is parameterised by the network weights and biases. More specifically, the parametrised morphism represents the function that maps an input vector x \in \mathbb{R}^s and a parameter vector p \in \mathbb{R}^d to the output vector y \in \mathbb{R}^t, where d is the total number of weights and biases in the network defined by the given architecture.
For a layer with input dimension m_i and output dimension m_{i+1}, the parameter object has dimension (m_i + 1) \times m_{i+1}, accounting for both the m_i \times m_{i+1} weights matrix and the m_{i+1} biases.
Hidden layers use ReLU nonlinearity between linear layers. The final layer is linear (no activation) so the returned morphism produces logits suitable for subsequent application of a loss or classification activation.
‣ NeuralNetworkPredictionMorphism( Para, s, hidden_layers_dims, t, activation ) | ( operation ) |
Returns: a parametrised morphism
It composes the logits morphisms with the specified activation function to create a parametrised morphism representing the predictions of a neural network. The network has the architecture specified by s, hidden_layers_dims, and t, i.e., the source and target of the parametrised morphism are \mathbb{R}^{s} and \mathbb{R}^{t}, respectively. The activation determines the final activation function:
\mathbf{Softmax}: applies the softmax activation to turn logits into probabilities for multi-class classification.
\mathbf{Sigmoid}: applies the sigmoid activation to turn logits into probabilities for binary classification.
\mathbf{IdFunc}: applies the identity function (no activation) for regression tasks.
‣ NeuralNetworkLossMorphism( Para, s, hidden_layers_dims, t, activation ) | ( operation ) |
Returns: a parametrised morphism
Construct a parametrised morphism representing the training loss of a fully-connected feed-forward neural network with architecture given by s, hidden_layers_dims and t. The returned parametrised morphism is parameterised by the network weights and biases and maps a pair (input, target) to a scalar loss: its source is \mathbb{R}^s \times \mathbb{R}^t (an input vector x and a target vector y) and its target is \mathbb{R} (the scalar loss).
The behaviour of the loss depends on the activation argument:
\mathbf{Softmax}:
Used for multi-class classification.
Softmax is applied to the logits to convert them into a probability distribution.
The loss is the (negative) cross-entropy between the predicted probabilities and the target distribution.
Targets y may be one-hot vectors or probability distributions over classes.
\mathbf{Sigmoid}:
Used for binary classification. Requires t = 1.
Applies the logistic sigmoid to the single logit to obtain a probability \hat{y} in [0,1].
The loss is binary cross-entropy: \mathrm{loss} = - ( y\log(\hat{y}) + (1-y)\log(1-\hat{y}) ).
\mathbf{IdFunc}:
Used for regression.
No final activation is applied. The loss is the mean squared error (MSE).
gap> Smooth := SkeletalCategoryOfSmoothMaps( ); SkeletalSmoothMaps gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> N213_Logits := NeuralNetworkLogitsMorphism( Para, 2, [ 1 ], 3 ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 gap> dummy_input := DummyInputForNeuralNetwork( 2, [ 1 ], 3 ); [ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2 ] gap> Display( N213_Logits : dummy_input := dummy_input ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 ‣ w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ‣ w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ‣ w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 gap> N213_Pred := NeuralNetworkPredictionMorphism( Para, 2, [ 1 ], 3, "IdFunc" ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 gap> N213_Pred = N213_Logits; true gap> N213_Loss := NeuralNetworkLossMorphism( Para, 2, [ 1 ], 3, "IdFunc" ); ℝ^5 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^14 -> ℝ^1 gap> vars := Concatenation( > DummyInputStringsForNeuralNetwork( 2, [ 1 ], 3 ), > DummyInputStrings( "y", 3 ) ); [ "w2_1_1", "b2_1", "w2_1_2", "b2_2", "w2_1_3", "b2_3", "w1_1_1", "w1_2_1", "b1_1", "z1", "z2", "y1", "y2", "y3" ] gap> dummy_input := CreateContextualVariables( vars ); [ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2, y1, y2, y3 ] gap> Display( N213_Loss : dummy_input := dummy_input ); ℝ^5 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^14 -> ℝ^1 ‣ ((w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 - y1) ^ 2 + (w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 - y2) ^ 2 + (w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 - y3) ^ 2) / 3
gap> Smooth := SkeletalCategoryOfSmoothMaps( ); SkeletalSmoothMaps gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> N213_Logits := NeuralNetworkLogitsMorphism( Para, 1, [ ], 1 ); ℝ^1 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^3 -> ℝ^1 gap> dummy_input := DummyInputForNeuralNetwork( 1, [ ], 1 ); [ w1_1_1, b1_1, z1 ] gap> Display( N213_Logits : dummy_input := dummy_input ); ℝ^1 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^3 -> ℝ^1 ‣ w1_1_1 * z1 + b1_1 gap> N213_Pred := PreCompose( N213_Logits, Para.Sigmoid( 1 ) ); ℝ^1 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^3 -> ℝ^1 gap> N213_Pred = NeuralNetworkPredictionMorphism( Para, 1, [ ], 1, "Sigmoid" ); true gap> Display( N213_Pred : dummy_input := dummy_input ); ℝ^1 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^3 -> ℝ^1 ‣ 1 / (1 + Exp( - (w1_1_1 * z1 + b1_1) )) gap> N213_Loss := NeuralNetworkLossMorphism( Para, 1, [ ], 1, "Sigmoid" ); ℝ^2 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^4 -> ℝ^1 gap> vars := Concatenation( > DummyInputStringsForNeuralNetwork( 1, [ ], 1 ), > [ "y1" ] ); [ "w1_1_1", "b1_1", "z1", "y1" ] gap> dummy_input := CreateContextualVariables( vars ); [ w1_1_1, b1_1, z1, y1 ] gap> Display( N213_Loss : dummy_input := dummy_input ); ℝ^2 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^4 -> ℝ^1 ‣ Log( 1 + Exp( - (w1_1_1 * z1 + b1_1) ) ) + (1 - y1) * (w1_1_1 * z1 + b1_1)
gap> Smooth := SkeletalCategoryOfSmoothMaps( ); SkeletalSmoothMaps gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> N213_Logits := NeuralNetworkLogitsMorphism( Para, 2, [ 1 ], 3 ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 gap> dummy_input := DummyInputForNeuralNetwork( 2, [ 1 ], 3 ); [ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2 ] gap> Display( N213_Logits : dummy_input := dummy_input ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 ‣ w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ‣ w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ‣ w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 gap> N213_Pred := PreCompose( N213_Logits, Para.Softmax( 3 ) ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 gap> N213_Pred = NeuralNetworkPredictionMorphism( Para, 2, [ 1 ], 3, "Softmax" ); true gap> Display( N213_Pred : dummy_input := dummy_input ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 ‣ Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) / (Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 )) ‣ Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) / (Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 )) ‣ Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 ) / (Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 )) gap> N213_Loss := NeuralNetworkLossMorphism( Para, 2, [ 1 ], 3, "Softmax" ); ℝ^5 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^14 -> ℝ^1 gap> vars := Concatenation( > DummyInputStringsForNeuralNetwork( 2, [ 1 ], 3 ), > DummyInputStrings( "y", 3 ) ); [ "w2_1_1", "b2_1", "w2_1_2", "b2_2", "w2_1_3", "b2_3", "w1_1_1", "w1_2_1", "b1_1", "z1", "z2", "y1", "y2", "y3" ] gap> dummy_input := CreateContextualVariables( vars ); [ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2, y1, y2, y3 ] gap> Display( N213_Loss : dummy_input := dummy_input ); ℝ^5 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^14 -> ℝ^1 ‣ ( ( Log( Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 ) ) - (w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1) ) * y1 + ( Log( Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 ) ) - (w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2) ) * y2 + ( Log( Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 ) ) - (w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3) ) * y3 ) / 3
generated by GAPDoc2HTML