NIKOLA Chess Engine
Supercomputer Chess Engine

Chess Engine for the AI Era

NIKOLA is a supercomputer-class chess engine featuring SPTT hybrid search (alpha-beta + GPU MCTS), CNN-based fortress detection, and GPU-batched NNUE evaluation achieving 500M+ positions/sec. Written entirely in MIND Language—scales from RTX 5090 to GB300 and Vera Rubin HPC clusters.

Linux/macOS
curl -fsSL https://nikolachess.com/install.sh | bash
Windows
irm https://nikolachess.com/install.ps1 | iex

Or manually:

git clone https://github.com/star-ga/NikolaChess && cd NikolaChess && make setup
3900+
Elo Rating
SPRT-verified
100B+
Training Positions
master games + self-play
100M+
Opening Book
grandmaster repertoire
GB300
Vera Rubin Ready
multi-GPU clusters
UCI
Protocol
+ Lichess Bot API

Core Technology

Built on cutting-edge GPU architectures and advanced AI algorithms

Artificial Intelligence

Alexander Kronrod, a pioneering Russian AI researcher, famously stated, "Chess is the Drosophila of AI." For humans, the "self" acts as a profound, unifying symbol—encompassing the player, their goal-driven system, and the interplay of mind and body. To replicate expert human strategy, we developed an advanced AI engine that prioritizes the strategic essence of chess. This engine employs deep analytical evaluation, drawing from a vast dataset of every recorded game and the styles of legendary players. The AI engine constantly evaluates the current game cycle and all algorithms running in parallel, then chooses the optimal strategy. Perfection is a much harder problem than simply being unbeatable—and that is our goal.

Numerical Analytics

Recent breakthroughs in unified memory architectures have revolutionized data access, enabling CPUs and GPUs to share ultra-high-speed memory seamlessly. NVIDIA's Blackwell and upcoming Vera Rubin architectures push this integration further with stacked HBM3e/HBM4 modules achieving bandwidths exceeding 8 terabytes per second. NVIDIA data-center GPUs from A100 through GB300 and Vera Rubin feature up to 288GB of HBM memory with NVLink 5.0/6.0 connectivity for optimized cluster operations. Consumer RTX 5090 brings Blackwell to desktop with 32GB GDDR7. In collaboration with Dell, Supermicro, and NVIDIA, modern clusters harness Spectrum-X networking, NVLink 5.0, and Infiniband NDR800 to achieve GPU throughput efficiencies of up to 98%.

Blackwell Architecture

Next-generation supercomputers integrate hundreds of thousands of NVIDIA Blackwell GPUs. This CPU-GPU hybrid architecture represents a massive leap from earlier systems, enabling distributed chess computation at unprecedented scale. NVIDIA GPUDirect facilitates direct GPU communication, while Remote Direct Memory Access (RDMA) ensures high-throughput, low-latency transfers without OS overhead. Leveraging Spectrum-X networking, NVLink 5.0, and Infiniband NDR800, modern clusters sustain 95%+ throughput across thousands of GPUs.

Dynamic Parallelism

Dynamic parallelism marks a leap in GPU computing, allowing on-demand kernel spawning on the GPU without CPU involvement. Embedded in NVIDIA's CUDA framework, it empowers threads within a grid to configure, launch, and synchronize new grids. The hardware-based SPAWN framework addresses scheduling challenges, optimizing dynamically generated kernels. By controlling resource allocation, SPAWN cuts launch overheads and queuing delays. Within systems powered by Blackwell GPUs—with HBM3e memory exceeding 8 TB/s aggregate bandwidth—integrating SPAWN optimizes workloads like chess game-tree traversal.

100% MIND Language

NikolaChess is written entirely in MIND—a next-generation systems language built for AI and HPC. No Python glue code, no C++ dependencies, no Rust interop. Pure MIND from board representation to GPU kernels. The result: a chess engine that compiles to a single binary with native performance on any platform.

TENSORS

Native Tensor Types

First-class tensor types with compile-time dimension checking. Define neural network weights as `tensor<i16, (45056, 1024)>` and the compiler ensures type safety across all operations. Automatic vectorization to AVX-512, NEON, or GPU warps.

GPU

Seamless GPU Offloading

Write `on(gpu0) { parallel for ... }` and code executes on GPU automatically. No manual memory transfers, no CUDA boilerplate. Same code runs on NVIDIA CUDA, AMD ROCm, Apple Metal, Intel oneAPI, and WebGPU—MIND Runtime handles translation.

SIMD

Hardware Intrinsics

Direct access to AVX-512, AVX-VNNI, AMX (Intel), NEON, SVE (ARM), and GPU tensor cores. NNUE inference uses INT8/FP8 quantization on supported hardware. The compiler auto-vectorizes loops and fuses memory operations.

PERF

Zero-Cost Abstractions

Rust-level performance with Python-level ergonomics. No garbage collection, no runtime overhead. Compile-time memory safety without sacrificing speed. NikolaChess achieves 500M+ NNUE evaluations/sec on a single RTX 5090.

SAFE

Memory Safety

Ownership and borrowing system prevents data races and memory leaks at compile time. Safe concurrency primitives for multi-threaded search. No undefined behavior, no segfaults—critical for 24/7 tournament operation.

TARGETS

Multi-Platform Compilation

Single codebase compiles to Linux x64/ARM64, macOS Intel/Apple Silicon, Windows x64/ARM64. GPU backends: CUDA (Ampere→Vera Rubin), ROCm (MI200→MI300X), Metal (M1→M4), DirectX 12, WebGPU. Write once, deploy everywhere.

Key Features

Everything you need for world-class chess computation

GPU-Batched NNUE

HalfKAv2 neural network with GPU-batched evaluation across Lazy SMP threads. Achieves 500M+ positions/sec throughput by batching 32-256 positions per GPU kernel. Supports multi-GPU distribution (up to 8 GPUs) with asynchronous CUDA streams.

SPTT Hybrid Search

Superparallel Tree Traversal (SPTT) dynamically switches between alpha-beta and Monte Carlo Tree Search based on position characteristics. High branching → MCTS with PUCT selection. Tactical/endgame → classical alpha-beta with perfect tablebase integration.

Fortress Detection

CNN-based fortress detection identifies unbreakable defensive structures with 95%+ accuracy. Detects rook + wrong-color bishop fortresses, blocked pawn chains, perpetual check opportunities, and stalemate traps. Never loses from a drawable position.

Advanced Search

History-modulated LMR, ProbCut pruning with statistical beta-cutoff prediction, killer moves, countermove heuristic, and continuation history. GPU MCTS with PUCT selection and virtual loss for parallel expansion.

GPU Acceleration

Native support for NVIDIA GPUs from RTX 5090 through A100, H100, Blackwell (B200/GB300), and Vera Rubin architectures. Leverages FP8/INT8 tensor cores for NNUE inference and policy-value network evaluation. Scales to 1024 GPUs with NVLink 5.0.

Opening Book

100M+ positions from grandmaster games in Polyglot format. Weighted move selection based on win rates, dynamic book depth based on position complexity, and anti-computer preparation lines.

Tablebase Coverage

7-man DTZ/WDL (16.7TB) with perfect endgame play. 8-man WDL (1.6PB) for complete coverage. Gaviota support and KPK bitbase. Specialized endgame evaluation patterns for positions beyond tablebase coverage.

SPRT Framework

Built-in Sequential Probability Ratio Test for Elo measurement. A/B testing framework with 30 curated benchmark positions. Automated regression testing with statistical significance verification.

UCI Protocol

Full UCI protocol support for Arena, ChessBase, CuteChess, Fritz. Dynamic contempt based on opponent strength. Lichess Bot API integration. Multi-PV analysis mode and aggressive time management.

Technology Stack

NVIDIA CUDA

  • GPUDirect RDMA
  • NVLink 5.0/6.0
  • CUDA 12.x + cuBLAS
  • Blackwell + Vera Rubin
  • FP8/INT8 Tensor Cores
  • cuDNN 9.x
Mind Language
  • Native Tensor Types
  • GPU-Batched Inference
  • CUDA/Metal/ROCm/oneAPI
  • AVX-512/AMX/NEON
  • Zero-cost Abstractions
  • Rust-level Safety

Chess Engine

  • GPU-Batched NNUE
  • SPTT Hybrid Search
  • GPU MCTS + PUCT
  • 7-man/8-man Syzygy
  • UCI + Lichess Bot
  • 500M+ positions/sec

Infrastructure

  • Spectrum-X Networking
  • Infiniband NDR800
  • HBM3e/HBM4 Memory
  • GB300 NVL72 Ready
  • 1024 GPU Scaling
  • Container-ready

Get in Touch

Have questions about NIKOLA or want to contribute to the project? We'd love to hear from you. Join our community of chess enthusiasts and AI researchers.