A neural network can be viewed as a composition of parametrised affine transformations and non-linear activation functions (such as ReLU, Sigmoid, Softmax, etc.).
‣ NeuralNetworkLogitsMorphism( Para, s, hidden_layers_dims, t ) | ( operation ) |
Returns: a parametrised morphism
The arguments are Para, a parametrised morphism category, s, a positive integer giving the input dimension, hidden_layers_dims, a list of positive integers giving the sizes of the hidden layers in order, and t, a positive integer giving the output dimension. This operation constructs a parametrised morphism that computes the logits (pre-activation outputs) of a fully-connected feed-forward neural network. The signature of the parametrised morphism is \(\mathbb{R}^s \to \mathbb{R}^t\) and is parameterised by the network weights and biases. More specifically, the parametrised morphism represents the function that maps an input vector \(x \in \mathbb{R}^s\) and a parameter vector \(p \in \mathbb{R}^d\) to the output vector \(y \in \mathbb{R}^t\), where \(d\) is the total number of weights and biases in the network defined by the given architecture.
For a layer with input dimension \(m_i\) and output dimension \(m_{i+1}\), the parameter object has dimension \((m_i + 1) \times m_{i+1}\), accounting for both the \(m_i \times m_{i+1}\) weights matrix and the \(m_{i+1}\) biases.
Hidden layers use ReLU nonlinearity between linear layers. The final layer is linear (no activation) so the returned morphism produces logits suitable for subsequent application of a loss or classification activation.
‣ NeuralNetworkPredictionMorphism( Para, s, hidden_layers_dims, t, activation ) | ( operation ) |
Returns: a parametrised morphism
It composes the logits morphisms with the specified activation function to create a parametrised morphism representing the predictions of a neural network. The network has the architecture specified by s, hidden_layers_dims, and t, i.e., the source and target of the parametrised morphism are \(\mathbb{R}^{s}\) and \(\mathbb{R}^{t}\), respectively. The activation determines the final activation function:
\(\mathbf{Softmax}\): applies the softmax activation to turn logits into probabilities for multi-class classification.
\(\mathbf{Sigmoid}\): applies the sigmoid activation to turn logits into probabilities for binary classification.
\(\mathbf{IdFunc}\): applies the identity function (no activation) for regression tasks.
‣ NeuralNetworkLossMorphism( Para, s, hidden_layers_dims, t, activation ) | ( operation ) |
Returns: a parametrised morphism
Construct a parametrised morphism representing the training loss of a fully-connected feed-forward neural network with architecture given by s, hidden_layers_dims and t. The returned parametrised morphism is parameterised by the network weights and biases and maps a pair (input, target) to a scalar loss: its source is \(\mathbb{R}^s \times \mathbb{R}^t\) (an input vector \(x\) and a target vector \(y\)) and its target is \(\mathbb{R}\) (the scalar loss).
The behaviour of the loss depends on the activation argument:
\(\mathbf{Softmax}\):
Used for multi-class classification.
Softmax is applied to the logits to convert them into a probability distribution.
The loss is the (negative) cross-entropy between the predicted probabilities and the target distribution.
Targets y may be one-hot vectors or probability distributions over classes.
\(\mathbf{Sigmoid}\):
Used for binary classification. Requires \(t = 1\).
Applies the logistic sigmoid to the single logit to obtain a probability \(\hat{y}\) in \([0,1]\).
The loss is binary cross-entropy: \(\mathrm{loss} = - ( y\log(\hat{y}) + (1-y)\log(1-\hat{y}) )\).
\(\mathbf{IdFunc}\):
Used for regression.
No final activation is applied. The loss is the mean squared error (MSE).
gap> Smooth := SkeletalCategoryOfSmoothMaps( ); SkeletalSmoothMaps gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> N213_Logits := NeuralNetworkLogitsMorphism( Para, 2, [ 1 ], 3 ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 gap> dummy_input := DummyInputForNeuralNetwork( 2, [ 1 ], 3 ); [ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2 ] gap> Display( N213_Logits : dummy_input := dummy_input ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 ‣ w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ‣ w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ‣ w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 gap> N213_Pred := NeuralNetworkPredictionMorphism( Para, 2, [ 1 ], 3, "IdFunc" ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 gap> N213_Pred = N213_Logits; true gap> N213_Loss := NeuralNetworkLossMorphism( Para, 2, [ 1 ], 3, "IdFunc" ); ℝ^5 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^14 -> ℝ^1 gap> vars := Concatenation( > DummyInputStringsForNeuralNetwork( 2, [ 1 ], 3 ), > DummyInputStrings( "y", 3 ) ); [ "w2_1_1", "b2_1", "w2_1_2", "b2_2", "w2_1_3", "b2_3", "w1_1_1", "w1_2_1", "b1_1", "z1", "z2", "y1", "y2", "y3" ] gap> dummy_input := CreateContextualVariables( vars ); [ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2, y1, y2, y3 ] gap> Display( N213_Loss : dummy_input := dummy_input ); ℝ^5 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^14 -> ℝ^1 ‣ ((w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 - y1) ^ 2 + (w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 - y2) ^ 2 + (w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 - y3) ^ 2) / 3
gap> Smooth := SkeletalCategoryOfSmoothMaps( ); SkeletalSmoothMaps gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> N213_Logits := NeuralNetworkLogitsMorphism( Para, 1, [ ], 1 ); ℝ^1 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^3 -> ℝ^1 gap> dummy_input := DummyInputForNeuralNetwork( 1, [ ], 1 ); [ w1_1_1, b1_1, z1 ] gap> Display( N213_Logits : dummy_input := dummy_input ); ℝ^1 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^3 -> ℝ^1 ‣ w1_1_1 * z1 + b1_1 gap> N213_Pred := PreCompose( N213_Logits, Para.Sigmoid( 1 ) ); ℝ^1 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^3 -> ℝ^1 gap> N213_Pred = NeuralNetworkPredictionMorphism( Para, 1, [ ], 1, "Sigmoid" ); true gap> Display( N213_Pred : dummy_input := dummy_input ); ℝ^1 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^3 -> ℝ^1 ‣ 1 / (1 + Exp( - (w1_1_1 * z1 + b1_1) )) gap> N213_Loss := NeuralNetworkLossMorphism( Para, 1, [ ], 1, "Sigmoid" ); ℝ^2 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^4 -> ℝ^1 gap> vars := Concatenation( > DummyInputStringsForNeuralNetwork( 1, [ ], 1 ), > [ "y1" ] ); [ "w1_1_1", "b1_1", "z1", "y1" ] gap> dummy_input := CreateContextualVariables( vars ); [ w1_1_1, b1_1, z1, y1 ] gap> Display( N213_Loss : dummy_input := dummy_input ); ℝ^2 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^2 Underlying Morphism: ------------------- ℝ^4 -> ℝ^1 ‣ Log( 1 + Exp( - (w1_1_1 * z1 + b1_1) ) ) + (1 - y1) * (w1_1_1 * z1 + b1_1)
gap> Smooth := SkeletalCategoryOfSmoothMaps( ); SkeletalSmoothMaps gap> Para := CategoryOfParametrisedMorphisms( Smooth ); CategoryOfParametrisedMorphisms( SkeletalSmoothMaps ) gap> N213_Logits := NeuralNetworkLogitsMorphism( Para, 2, [ 1 ], 3 ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 gap> dummy_input := DummyInputForNeuralNetwork( 2, [ 1 ], 3 ); [ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2 ] gap> Display( N213_Logits : dummy_input := dummy_input ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 ‣ w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ‣ w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ‣ w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 gap> N213_Pred := PreCompose( N213_Logits, Para.Softmax( 3 ) ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 gap> N213_Pred = NeuralNetworkPredictionMorphism( Para, 2, [ 1 ], 3, "Softmax" ); true gap> Display( N213_Pred : dummy_input := dummy_input ); ℝ^2 -> ℝ^3 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^11 -> ℝ^3 ‣ Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) / (Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 )) ‣ Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) / (Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 )) ‣ Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 ) / (Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 )) gap> N213_Loss := NeuralNetworkLossMorphism( Para, 2, [ 1 ], 3, "Softmax" ); ℝ^5 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^14 -> ℝ^1 gap> vars := Concatenation( > DummyInputStringsForNeuralNetwork( 2, [ 1 ], 3 ), > DummyInputStrings( "y", 3 ) ); [ "w2_1_1", "b2_1", "w2_1_2", "b2_2", "w2_1_3", "b2_3", "w1_1_1", "w1_2_1", "b1_1", "z1", "z2", "y1", "y2", "y3" ] gap> dummy_input := CreateContextualVariables( vars ); [ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2, y1, y2, y3 ] gap> Display( N213_Loss : dummy_input := dummy_input ); ℝ^5 -> ℝ^1 defined by: Underlying Object: ----------------- ℝ^9 Underlying Morphism: ------------------- ℝ^14 -> ℝ^1 ‣ ( ( Log( Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 ) ) - (w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1) ) * y1 + ( Log( Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 ) ) - (w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2) ) * y2 + ( Log( Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 ) ) - (w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3) ) * y3 ) / 3
generated by GAPDoc2HTML