
Chess Engine for the AI Era
NIKOLA is a supercomputer-class chess engine featuring SPTT hybrid search (alpha-beta + GPU MCTS), CNN-based fortress detection, and GPU-batched NNUE evaluation achieving 500M+ positions/sec. Written entirely in MIND Language—scales from RTX 5090 to GB300 and Vera Rubin HPC clusters.
curl -fsSL https://nikolachess.com/install.sh | bashirm https://nikolachess.com/install.ps1 | iexOr manually:
git clone https://github.com/star-ga/NikolaChess && cd NikolaChess && make setupCore Technology
Built on cutting-edge GPU architectures and advanced AI algorithms
Artificial Intelligence
Numerical Analytics
Blackwell Architecture
Dynamic Parallelism
100% MIND Language
NikolaChess is written entirely in MIND—a next-generation systems language built for AI and HPC. No Python glue code, no C++ dependencies, no Rust interop. Pure MIND from board representation to GPU kernels. The result: a chess engine that compiles to a single binary with native performance on any platform.
Native Tensor Types
First-class tensor types with compile-time dimension checking. Define neural network weights as `tensor<i16, (45056, 1024)>` and the compiler ensures type safety across all operations. Automatic vectorization to AVX-512, NEON, or GPU warps.
Seamless GPU Offloading
Write `on(gpu0) { parallel for ... }` and code executes on GPU automatically. No manual memory transfers, no CUDA boilerplate. Same code runs on NVIDIA CUDA, AMD ROCm, Apple Metal, Intel oneAPI, and WebGPU—MIND Runtime handles translation.
Hardware Intrinsics
Direct access to AVX-512, AVX-VNNI, AMX (Intel), NEON, SVE (ARM), and GPU tensor cores. NNUE inference uses INT8/FP8 quantization on supported hardware. The compiler auto-vectorizes loops and fuses memory operations.
Zero-Cost Abstractions
Rust-level performance with Python-level ergonomics. No garbage collection, no runtime overhead. Compile-time memory safety without sacrificing speed. NikolaChess achieves 500M+ NNUE evaluations/sec on a single RTX 5090.
Memory Safety
Ownership and borrowing system prevents data races and memory leaks at compile time. Safe concurrency primitives for multi-threaded search. No undefined behavior, no segfaults—critical for 24/7 tournament operation.
Multi-Platform Compilation
Single codebase compiles to Linux x64/ARM64, macOS Intel/Apple Silicon, Windows x64/ARM64. GPU backends: CUDA (Ampere→Vera Rubin), ROCm (MI200→MI300X), Metal (M1→M4), DirectX 12, WebGPU. Write once, deploy everywhere.
Key Features
Everything you need for world-class chess computation
GPU-Batched NNUE
HalfKAv2 neural network with GPU-batched evaluation across Lazy SMP threads. Achieves 500M+ positions/sec throughput by batching 32-256 positions per GPU kernel. Supports multi-GPU distribution (up to 8 GPUs) with asynchronous CUDA streams.
SPTT Hybrid Search
Superparallel Tree Traversal (SPTT) dynamically switches between alpha-beta and Monte Carlo Tree Search based on position characteristics. High branching → MCTS with PUCT selection. Tactical/endgame → classical alpha-beta with perfect tablebase integration.
Fortress Detection
CNN-based fortress detection identifies unbreakable defensive structures with 95%+ accuracy. Detects rook + wrong-color bishop fortresses, blocked pawn chains, perpetual check opportunities, and stalemate traps. Never loses from a drawable position.
Advanced Search
History-modulated LMR, ProbCut pruning with statistical beta-cutoff prediction, killer moves, countermove heuristic, and continuation history. GPU MCTS with PUCT selection and virtual loss for parallel expansion.
GPU Acceleration
Native support for NVIDIA GPUs from RTX 5090 through A100, H100, Blackwell (B200/GB300), and Vera Rubin architectures. Leverages FP8/INT8 tensor cores for NNUE inference and policy-value network evaluation. Scales to 1024 GPUs with NVLink 5.0.
Opening Book
100M+ positions from grandmaster games in Polyglot format. Weighted move selection based on win rates, dynamic book depth based on position complexity, and anti-computer preparation lines.
Tablebase Coverage
7-man DTZ/WDL (16.7TB) with perfect endgame play. 8-man WDL (1.6PB) for complete coverage. Gaviota support and KPK bitbase. Specialized endgame evaluation patterns for positions beyond tablebase coverage.
SPRT Framework
Built-in Sequential Probability Ratio Test for Elo measurement. A/B testing framework with 30 curated benchmark positions. Automated regression testing with statistical significance verification.
UCI Protocol
Full UCI protocol support for Arena, ChessBase, CuteChess, Fritz. Dynamic contempt based on opponent strength. Lichess Bot API integration. Multi-PV analysis mode and aggressive time management.
Technology Stack
NVIDIA CUDA
- GPUDirect RDMA
- NVLink 5.0/6.0
- CUDA 12.x + cuBLAS
- Blackwell + Vera Rubin
- FP8/INT8 Tensor Cores
- cuDNN 9.x
- Native Tensor Types
- GPU-Batched Inference
- CUDA/Metal/ROCm/oneAPI
- AVX-512/AMX/NEON
- Zero-cost Abstractions
- Rust-level Safety
Chess Engine
- GPU-Batched NNUE
- SPTT Hybrid Search
- GPU MCTS + PUCT
- 7-man/8-man Syzygy
- UCI + Lichess Bot
- 500M+ positions/sec
Infrastructure
- Spectrum-X Networking
- Infiniband NDR800
- HBM3e/HBM4 Memory
- GB300 NVL72 Ready
- 1024 GPU Scaling
- Container-ready
Documentation
Everything you need to get started and master NIKOLA
Get in Touch
Have questions about NIKOLA or want to contribute to the project? We'd love to hear from you. Join our community of chess enthusiasts and AI researchers.