Fri. Jan 10th, 2025
Woman in AI

In the world of AI, transformers have revolutionized the way we approach natural language processing (NLP) tasks. At the heart of these advancements is the GPT (Generative Pre-trained Transformer) architecture, which powers some of the most impressive AI applications today. While many implementations rely on Python-based libraries like TensorFlow or PyTorch, it’s entirely possible—and exciting—to build a GPT-based language model using C# and .NET.

This guide outlines a step-by-step approach to building your own GPT-like language model in C#. Whether you’re a seasoned .NET developer or just curious about diving deeper into machine learning, this process demonstrates the power and flexibility of C# in tackling complex AI challenges.


Why Build a GPT Model in C#?

While Python dominates the AI landscape, there are several compelling reasons to use C# for building machine learning models:

  1. Familiar Ecosystem: For developers working in the Microsoft ecosystem, C# integrates seamlessly with tools like Visual Studio and Azure.
  2. High Performance: C# is a compiled language, offering robust performance advantages for computational tasks.
  3. Cross-Platform Compatibility: With .NET Core, C# applications can run on Windows, macOS, and Linux.
  4. Learning Opportunity: Building a GPT model from scratch in C# is not just a technical challenge—it’s a rewarding way to understand the inner workings of modern AI systems.

Step 1: Building the Tensor Library

At the core of any machine learning system lies the tensor—a multi-dimensional array used for storing and processing data. In this step, we’ll create a Tensor library in C# that provides the foundation for all subsequent computations.

Creating the Tensor Class

A Tensor class in C# manages multi-dimensional arrays and supports indexing, broadcasting, and mathematical operations. For example, matrix multiplication—a key operation in GPT models—is implemented to handle these arrays efficiently.

public class Tensor

{

    public float[] Data;  // Flattened data

    public int[] Shape;   // Dimensions of the tensor

    public int[] Strides; // Strides for indexing

    public Tensor(int[] shape)

    {

        Shape = shape;

        Strides = ComputeStrides(Shape);

        Data = new float[Shape.Aggregate(1, (a, b) => a * b)];

    }

    private static int[] ComputeStrides(int[] shape)

    {

        int[] strides = new int[shape.Length];

        int stride = 1;

        for (int i = shape.Length – 1; i >= 0; i–)

        {

            strides[i] = stride;

            stride *= shape[i];

        }

        return strides;

    }

    public float this[params int[] indices]

    {

        get

        {

            int flatIndex = 0;

            for (int i = 0; i < indices.Length; i++)

            {

                flatIndex += indices[i] * Strides[i];

            }

            return Data[flatIndex];

        }

        set

        {

            int flatIndex = 0;

            for (int i = 0; i < indices.Length; i++)

            {

                flatIndex += indices[i] * Strides[i];

            }

            Data[flatIndex] = value;

        }

    }

}

This Tensor library enables efficient mathematical operations like matrix multiplication, which powers the self-attention mechanism in transformers.


Step 2: The Transformer Architecture

Transformers, the building blocks of GPT, use a combination of multi-head self-attention, feedforward networks, and positional encodings. Let’s break this down.

Positional Encoding

Since transformers lack inherent sequence awareness, positional encodings are added to input embeddings to represent the order of words.

public class PositionalEncoding

{

    public static Tensor AddPositionalEncoding(Tensor input)

    {

        int seqLength = input.Shape[0];

        int embedSize = input.Shape[1];

        Tensor encoding = new Tensor(new int[] { seqLength, embedSize });

        for (int pos = 0; pos < seqLength; pos++)

        {

            for (int i = 0; i < embedSize; i++)

            {

                if (i % 2 == 0)

                {

                    encoding[pos, i] = (float)Math.Sin(pos / Math.Pow(10000, 2.0 * i / embedSize));

                }

                else

                {

                    encoding[pos, i] = (float)Math.Cos(pos / Math.Pow(10000, 2.0 * i / embedSize));

                }

            }

        }

        return Tensor.Add(input, encoding);

    }

}


Multi-Head Self-Attention

Self-attention is the heart of transformers, enabling them to focus on relevant parts of the input sequence.

public class MultiHeadAttention

{

    private Tensor QueryWeights, KeyWeights, ValueWeights, OutputWeights;

    public MultiHeadAttention(int embedSize, int numHeads)

    {

        int headDim = embedSize / numHeads;

        QueryWeights = new Tensor(new[] { embedSize, headDim });

        KeyWeights = new Tensor(new[] { embedSize, headDim });

        ValueWeights = new Tensor(new[] { embedSize, headDim });

        OutputWeights = new Tensor(new[] { embedSize, embedSize });

    }

    public Tensor Forward(Tensor query, Tensor key, Tensor value)

    {

        Tensor q = Tensor.MatMul(query, QueryWeights);

        Tensor k = Tensor.MatMul(key, KeyWeights);

        Tensor v = Tensor.MatMul(value, ValueWeights);

        Tensor scores = Tensor.MatMul(q, Tensor.Transpose(k)) / (float)Math.Sqrt(k.Shape[1]);

        Tensor attentionWeights = Tensor.Softmax(scores);

        Tensor context = Tensor.MatMul(attentionWeights, v);

        return Tensor.MatMul(context, OutputWeights);

    }

}

This implementation computes attention scores, normalizes them using softmax, and applies them to the value vectors.


Feedforward Networks

Feedforward networks in transformers provide non-linearity and depth.

public class FeedForwardNetwork

{

    private Tensor Weights1, Weights2, Bias1, Bias2;

    public FeedForwardNetwork(int embedSize, int hiddenSize)

    {

        Weights1 = new Tensor(new[] { embedSize, hiddenSize });

        Weights2 = new Tensor(new[] { hiddenSize, embedSize });

        Bias1 = new Tensor(new[] { hiddenSize });

        Bias2 = new Tensor(new[] { embedSize });

    }

    public Tensor Forward(Tensor input)

    {

        Tensor hidden = Tensor.Relu(Tensor.MatMul(input, Weights1) + Bias1);

        return Tensor.MatMul(hidden, Weights2) + Bias2;

    }

}


Step 3: Training the Model

Loss Function

The cross-entropy loss function measures the performance of the model’s predictions compared to the ground truth.

public static float CrossEntropyLoss(Tensor logits, Tensor targets)

{

    float loss = 0;

    for (int i = 0; i < logits.Size; i++)

    {

        loss -= targets.Data[i] * (float)Math.Log(logits.Data[i]);

    }

    return loss / logits.Size;

}


Training Loop

The training loop processes the input data, calculates the loss, and updates model parameters through backpropagation.

public void Train(List<string> dataset, int epochs)

{

    foreach (int epoch in Enumerable.Range(0, epochs))

    {

        foreach (string sample in dataset)

        {

            Tensor input = Tokenizer.Encode(sample);

            Tensor output = Model.Forward(input);

            Tensor loss = CrossEntropyLoss(output, target);

            Model.Backpropagate(loss);

            Optimizer.Step();

        }

    }

}


A Bright Future for C# in AI

By building a GPT-like model in C#, we unlock exciting opportunities for innovation in the .NET ecosystem. While the process requires time and effort, it provides unparalleled insights into the mechanics of AI. With its performance advantages and seamless integration into existing enterprise solutions, C# is poised to play a key role in advancing AI capabilities.

Whether you’re experimenting for fun or creating production-grade applications, this guide demonstrates that the sky’s the limit with C# and .NET. Let’s embrace the challenge and push the boundaries of what’s possible. This article powered by AGImageAI and AlbertAGPT.