GAP (GradientBasedLearningForCAP) - Chapter 6: Neural Networks

A neural network can be viewed as a composition of parametrised affine transformations and non-linear activation functions (such as ReLU, Sigmoid, Softmax, etc.).

6.2 Operations

6.2-1 NeuralNetworkLogitsMorphism

The arguments are Para, a parametrised morphism category, s, a positive integer giving the input dimension, hidden_layers_dims, a list of positive integers giving the sizes of the hidden layers in order, and t, a positive integer giving the output dimension. This operation constructs a parametrised morphism that computes the logits (pre-activation outputs) of a fully-connected feed-forward neural network. The signature of the parametrised morphism is \(\mathbb{R}^s \to \mathbb{R}^t\) and is parameterised by the network weights and biases. More specifically, the parametrised morphism represents the function that maps an input vector \(x \in \mathbb{R}^s\) and a parameter vector \(p \in \mathbb{R}^d\) to the output vector \(y \in \mathbb{R}^t\), where \(d\) is the total number of weights and biases in the network defined by the given architecture.

6.2-2 NeuralNetworkPredictionMorphism

It composes the logits morphisms with the specified activation function to create a parametrised morphism representing the predictions of a neural network. The network has the architecture specified by s, hidden_layers_dims, and t, i.e., the source and target of the parametrised morphism are \(\mathbb{R}^{s}\) and \(\mathbb{R}^{t}\), respectively. The activation determines the final activation function:

6.2-3 NeuralNetworkLossMorphism

Construct a parametrised morphism representing the training loss of a fully-connected feed-forward neural network with architecture given by s, hidden_layers_dims and t. The returned parametrised morphism is parameterised by the network weights and biases and maps a pair (input, target) to a scalar loss: its source is \(\mathbb{R}^s \times \mathbb{R}^t\) (an input vector \(x\) and a target vector \(y\)) and its target is \(\mathbb{R}\) (the scalar loss).

6.3 Examples

gap> Smooth := SkeletalCategoryOfSmoothMaps( );
SkeletalSmoothMaps
gap> Para := CategoryOfParametrisedMorphisms( Smooth );
CategoryOfParametrisedMorphisms( SkeletalSmoothMaps )
gap> N213_Logits := NeuralNetworkLogitsMorphism( Para, 2, [ 1 ], 3 );
ℝ^2 -> ℝ^3 defined by:

Underlying Object:
-----------------
ℝ^9

Underlying Morphism:
-------------------
ℝ^11 -> ℝ^3
gap> dummy_input := DummyInputForNeuralNetwork( 2, [ 1 ], 3 );
[ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2 ]
gap> Display( N213_Logits : dummy_input := dummy_input );
ℝ^2 -> ℝ^3 defined by:

Underlying Object:
-----------------
ℝ^9

Underlying Morphism:
-------------------
ℝ^11 -> ℝ^3

‣ w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1
‣ w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2
‣ w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3
gap> N213_Pred := NeuralNetworkPredictionMorphism( Para, 2, [ 1 ], 3, "IdFunc" );
ℝ^2 -> ℝ^3 defined by:

Underlying Object:
-----------------
ℝ^9

Underlying Morphism:
-------------------
ℝ^11 -> ℝ^3
gap> N213_Pred = N213_Logits;
true
gap> N213_Loss := NeuralNetworkLossMorphism( Para, 2, [ 1 ], 3, "IdFunc" );
ℝ^5 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^9

Underlying Morphism:
-------------------
ℝ^14 -> ℝ^1
gap> vars := Concatenation(
>                 DummyInputStringsForNeuralNetwork( 2, [ 1 ], 3 ),
>                 DummyInputStrings( "y", 3 ) );
[ "w2_1_1", "b2_1", "w2_1_2", "b2_2", "w2_1_3", "b2_3", "w1_1_1", "w1_2_1", 
  "b1_1", "z1", "z2", "y1", "y2", "y3" ]
gap> dummy_input := CreateContextualVariables( vars );
[ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2,
  y1, y2, y3 ]
gap> Display( N213_Loss : dummy_input := dummy_input );
ℝ^5 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^9

Underlying Morphism:
-------------------
ℝ^14 -> ℝ^1

‣ ((w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 - y1) ^ 2
   + (w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 - y2) ^ 2
   + (w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 - y3) ^ 2) / 3

gap> Smooth := SkeletalCategoryOfSmoothMaps( );
SkeletalSmoothMaps
gap> Para := CategoryOfParametrisedMorphisms( Smooth );
CategoryOfParametrisedMorphisms( SkeletalSmoothMaps )
gap> N213_Logits := NeuralNetworkLogitsMorphism( Para, 1, [ ], 1 );
ℝ^1 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^2

Underlying Morphism:
-------------------
ℝ^3 -> ℝ^1
gap> dummy_input := DummyInputForNeuralNetwork( 1, [ ], 1 );
[ w1_1_1, b1_1, z1 ]
gap> Display( N213_Logits : dummy_input := dummy_input );
ℝ^1 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^2

Underlying Morphism:
-------------------
ℝ^3 -> ℝ^1

‣ w1_1_1 * z1 + b1_1
gap> N213_Pred := PreCompose( N213_Logits, Para.Sigmoid( 1 ) );
ℝ^1 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^2

Underlying Morphism:
-------------------
ℝ^3 -> ℝ^1
gap> N213_Pred = NeuralNetworkPredictionMorphism( Para, 1, [ ], 1, "Sigmoid" );
true
gap> Display( N213_Pred : dummy_input := dummy_input );
ℝ^1 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^2

Underlying Morphism:
-------------------
ℝ^3 -> ℝ^1

‣ 1 / (1 + Exp( - (w1_1_1 * z1 + b1_1) ))
gap> N213_Loss := NeuralNetworkLossMorphism( Para, 1, [ ], 1, "Sigmoid" );
ℝ^2 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^2

Underlying Morphism:
-------------------
ℝ^4 -> ℝ^1
gap> vars := Concatenation(
>                 DummyInputStringsForNeuralNetwork( 1, [ ], 1 ),
>                 [ "y1" ] );
[ "w1_1_1", "b1_1", "z1", "y1" ]
gap> dummy_input := CreateContextualVariables( vars );
[ w1_1_1, b1_1, z1, y1 ]
gap> Display( N213_Loss : dummy_input := dummy_input );
ℝ^2 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^2

Underlying Morphism:
-------------------
ℝ^4 -> ℝ^1

‣ Log( 1 + Exp( - (w1_1_1 * z1 + b1_1) ) ) + (1 - y1) * (w1_1_1 * z1 + b1_1)

gap> Smooth := SkeletalCategoryOfSmoothMaps( );
SkeletalSmoothMaps
gap> Para := CategoryOfParametrisedMorphisms( Smooth );
CategoryOfParametrisedMorphisms( SkeletalSmoothMaps )
gap> N213_Logits := NeuralNetworkLogitsMorphism( Para, 2, [ 1 ], 3 );
ℝ^2 -> ℝ^3 defined by:

Underlying Object:
-----------------
ℝ^9

Underlying Morphism:
-------------------
ℝ^11 -> ℝ^3
gap> dummy_input := DummyInputForNeuralNetwork( 2, [ 1 ], 3 );
[ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2 ]
gap> Display( N213_Logits : dummy_input := dummy_input );
ℝ^2 -> ℝ^3 defined by:

Underlying Object:
-----------------
ℝ^9

Underlying Morphism:
-------------------
ℝ^11 -> ℝ^3

‣ w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1
‣ w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2
‣ w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3
gap> N213_Pred := PreCompose( N213_Logits, Para.Softmax( 3 ) );
ℝ^2 -> ℝ^3 defined by:

Underlying Object:
-----------------
ℝ^9

Underlying Morphism:
-------------------
ℝ^11 -> ℝ^3
gap> N213_Pred = NeuralNetworkPredictionMorphism( Para, 2, [ 1 ], 3, "Softmax" );
true
gap> Display( N213_Pred : dummy_input := dummy_input );
ℝ^2 -> ℝ^3 defined by:

Underlying Object:
-----------------
ℝ^9

Underlying Morphism:
-------------------
ℝ^11 -> ℝ^3
‣ Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 )
  / (Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 )
     + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 )
       + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 ))
‣ Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 )
  / (Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 )
     + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 )
       + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 ))
‣ Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 )
  / (Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 )
     + Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 )
       + Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 ))
gap> N213_Loss := NeuralNetworkLossMorphism( Para, 2, [ 1 ], 3, "Softmax" );
ℝ^5 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^9

Underlying Morphism:
-------------------
ℝ^14 -> ℝ^1
gap> vars := Concatenation(
>                 DummyInputStringsForNeuralNetwork( 2, [ 1 ], 3 ),
>                 DummyInputStrings( "y", 3 ) );
[ "w2_1_1", "b2_1", "w2_1_2", "b2_2", "w2_1_3", "b2_3", "w1_1_1", "w1_2_1", 
  "b1_1", "z1", "z2", "y1", "y2", "y3" ]
gap> dummy_input := CreateContextualVariables( vars );
[ w2_1_1, b2_1, w2_1_2, b2_2, w2_1_3, b2_3, w1_1_1, w1_2_1, b1_1, z1, z2,
  y1, y2, y3 ]
gap> Display( N213_Loss : dummy_input := dummy_input );
ℝ^5 -> ℝ^1 defined by:

Underlying Object:
-----------------
ℝ^9

Underlying Morphism:
-------------------
ℝ^14 -> ℝ^1

‣ (
    (
      Log(
        Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) +
        Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) +
        Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 )
      )
      - (w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1)
    ) * y1 +
    (
      Log( Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) +
           Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) +
           Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 )
      )
      - (w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2)
    ) * y2
    +
    (
      Log( Exp( w2_1_1 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_1 ) +
           Exp( w2_1_2 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_2 ) +
           Exp( w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3 )
      )
      - (w2_1_3 * Relu( w1_1_1 * z1 + w1_2_1 * z2 + b1_1 ) + b2_3)
    ) * y3
  ) / 3

6 Neural Networks

6.1 Definition

6.2 Operations

6.2-1 NeuralNetworkLogitsMorphism

6.2-2 NeuralNetworkPredictionMorphism

6.2-3 NeuralNetworkLossMorphism

6.3 Examples