Neural Network Architecture

Understanding NIKOLA's NNUE evaluation and GPU-accelerated inference.

Overview

NIKOLA uses a state-of-the-art Efficiently Updatable Neural Network (NNUE) for position evaluation. Unlike traditional hand-crafted evaluation functions, NNUE learns complex positional patterns from millions of high-quality games and engine self-play.

HalfKA Architecture

NIKOLA employs the HalfKA (Half-King-All) feature set, which encodes:

  • King position - Relative location of each side's king
  • Piece placement - All pieces indexed relative to king position
  • Perspective - Separate views for white and black

This representation enables efficient incremental updates as pieces move, requiring only sparse matrix operations rather than full network evaluation.

Network Topology

Input Layer:768 x 2 (HalfKA features)
Hidden Layer 1:1024 neurons (ClippedReLU)
Hidden Layer 2:512 neurons (ClippedReLU)
Hidden Layer 3:256 neurons (ClippedReLU)
Output Layer:1 (centipawn evaluation)

GPU-Batched Evaluation

NIKOLA's neural network inference is fully GPU-accelerated using native MIND tensor operations. The GPU-batched NNUE system collects positions from Lazy SMP search threads and evaluates them in batches for maximum throughput:

  • Batched inference - 32-256 positions per GPU kernel, achieving 500M+ positions/sec
  • Tensor cores - FP8/INT8 mixed precision on Blackwell and Vera Rubin architectures
  • Multi-GPU distribution - Scale to 8+ GPUs per node with NVLink 5.0
  • Async CUDA streams - Overlap compute with memory transfers for maximum utilization
  • Virtual loss - Enable parallel MCTS expansion without tree corruption

Supported GPU Architectures

NVIDIA (CUDA)

  • Ampere (A100, RTX 30xx/40xx)
  • Blackwell Consumer (RTX 5090, RTX 5080)
  • Hopper (H100, H200)
  • Blackwell Data Center (B200, GB200, GB300)
  • Vera Rubin (next-gen architecture)

AMD (ROCm)

  • CDNA 2 (MI200 series)
  • CDNA 3 (MI300 series)
  • RDNA 3 (RX 7000 series)

Apple (Metal)

  • M1 / M1 Pro / M1 Max / M1 Ultra
  • M2 / M2 Pro / M2 Max / M2 Ultra
  • M3 / M3 Pro / M3 Max
  • M4 / M4 Pro / M4 Max

WebGPU

  • Chrome 113+
  • Firefox 121+
  • Safari 18+

Incremental Updates

The NNUE architecture enables "efficiently updatable" evaluation. When a move is made, only the affected features need recalculation rather than the entire network. This provides 10-100x speedup compared to full evaluation, critical for deep search.

Training Data

NIKOLA's network is trained on a proprietary dataset of over 10 billion positions generated through:

  • Self-play games at various time controls
  • High-depth analysis of master games
  • Curated endgame positions from tablebases
  • Adversarial positions designed to expose weaknesses

Custom Networks

Advanced users can load custom NNUE networks via the UCI option:

setoption name NNUEPath value /path/to/custom.nnue
Building from SourceMIND Language