Neural Network Architecture
Understanding NIKOLA's NNUE evaluation and GPU-accelerated inference.
Overview
NIKOLA uses a state-of-the-art Efficiently Updatable Neural Network (NNUE) for position evaluation. Unlike traditional hand-crafted evaluation functions, NNUE learns complex positional patterns from millions of high-quality games and engine self-play.
HalfKA Architecture
NIKOLA employs the HalfKA (Half-King-All) feature set, which encodes:
- King position - Relative location of each side's king
- Piece placement - All pieces indexed relative to king position
- Perspective - Separate views for white and black
This representation enables efficient incremental updates as pieces move, requiring only sparse matrix operations rather than full network evaluation.
Network Topology
GPU-Batched Evaluation
NIKOLA's neural network inference is fully GPU-accelerated using native MIND tensor operations. The GPU-batched NNUE system collects positions from Lazy SMP search threads and evaluates them in batches for maximum throughput:
- Batched inference - 32-256 positions per GPU kernel, achieving 500M+ positions/sec
- Tensor cores - FP8/INT8 mixed precision on Blackwell and Vera Rubin architectures
- Multi-GPU distribution - Scale to 8+ GPUs per node with NVLink 5.0
- Async CUDA streams - Overlap compute with memory transfers for maximum utilization
- Virtual loss - Enable parallel MCTS expansion without tree corruption
Supported GPU Architectures
NVIDIA (CUDA)
- Ampere (A100, RTX 30xx/40xx)
- Blackwell Consumer (RTX 5090, RTX 5080)
- Hopper (H100, H200)
- Blackwell Data Center (B200, GB200, GB300)
- Vera Rubin (next-gen architecture)
AMD (ROCm)
- CDNA 2 (MI200 series)
- CDNA 3 (MI300 series)
- RDNA 3 (RX 7000 series)
Apple (Metal)
- M1 / M1 Pro / M1 Max / M1 Ultra
- M2 / M2 Pro / M2 Max / M2 Ultra
- M3 / M3 Pro / M3 Max
- M4 / M4 Pro / M4 Max
WebGPU
- Chrome 113+
- Firefox 121+
- Safari 18+
Incremental Updates
The NNUE architecture enables "efficiently updatable" evaluation. When a move is made, only the affected features need recalculation rather than the entire network. This provides 10-100x speedup compared to full evaluation, critical for deep search.
Training Data
NIKOLA's network is trained on a proprietary dataset of over 10 billion positions generated through:
- Self-play games at various time controls
- High-depth analysis of master games
- Curated endgame positions from tablebases
- Adversarial positions designed to expose weaknesses
Custom Networks
Advanced users can load custom NNUE networks via the UCI option:
setoption name NNUEPath value /path/to/custom.nnue