Go back

Setting Up a Gonka.ai H100 GPU Node: A Technical Journey with AI Assistance

Written by
Olena TkhorovskaOlena Tkhorovska
on January 13, 2026
A Technical Journey with AI Assistance

A Technical Journey with AI Assistance

An honest account of running a decentralized AI compute node, featuring the surprising difference between ChatGPT and Claude Code. To learn why we are sharing the full technical journey, read the Preamble.

Working vs Earning

Day 2, hour 26 of running my Gonka.ai node. Everything looks perfect: synced to block 1,979,450, all services green, 76 successful validations... and exactly 0.00 GONKA in earnings. Then I check the logs: no validation activity for 14 hours. The GPU sits idle. The network is silent.

Is my node even participating?

This is the story of setting up a decentralized AI compute node on an H100 GPU, and discovering that "working" and "earning" are two very different things. But more importantly, it's about discovering that the AI tool you choose for infrastructure work can mean the difference between success and giving up entirely.

Why This Setup Was Different

I'm not a DevOps expert or blockchain specialist. I write code, sure, but managing infrastructure, debugging SSH issues, and configuring blockchain validators? That's not my daily work.

When I decided to run a Gonka.ai node on my H100 GPU, my first instinct was obvious: "I'll use ChatGPT to help guide me through this."

That lasted about 30 minutes before I hit the wall.

The ChatGPT Problem (And Why It Doesn't Work for Infrastructure)

Here's what the ChatGPT workflow looks like for complex infrastructure setup:

The fundamental problem: ChatGPT can't see your files, run commands, or verify if anything actually worked. Every response is theoretical. It's giving advice to someone it can't see, about a system it can't access, with errors it can't verify.

For someone without deep Linux/Docker/blockchain expertise, this quickly becomes a game of copy-paste roulette. You're not learning—you're just hoping the next code block will magically work.

What I Actually Used: Claude Code

Instead, I used Claude Code (Anthropic's CLI coding assistant). Not the web chat interface—the actual command-line tool that runs on your machine.

Here's the difference:

Claude Code can:

✅ Read your actual files (knows what's already there)

✅ Edit files directly (no copy-paste)

✅ Run SSH commands on your server

✅ See error messages immediately

✅ Verify if something worked before moving on

✅ Maintain full context of your entire setup

ChatGPT cannot:

❌ Access your filesystem

❌ Execute commands

❌ See if its suggestions worked

❌ Fix errors in real-time

A Real Example: The Difference in Practice

With ChatGPT (hypothetical):

Me: "My Docker services won't start"

ChatGPT: "Check your .env file. It might have export statements. Try this format: [provides 50-line example file]"

Me: [Reads example, tries to figure out which parts to change]

Me: [Edits file, possibly wrong parts]

Me: [Runs docker compose up, gets different error]

Me: "Now I get: invalid syntax"

ChatGPT: "That could be several things. First check..."

[30 minutes later, still debugging]

With Claude Code (what actually happened):

Me: "My Docker services won't start"

Claude Code:

Total time: 2 minutes

This isn't a theoretical difference. This is the actual experience I had, repeatedly, throughout the 8-hour setup process.

The Transparency Bit

Before we go further: This isn't sponsored content. I'm not affiliated with Anthropic. Other tools exist (Cursor, GitHub Copilot, Aider) that might work similarly—I just happened to use Claude Code because it was designed for CLI infrastructure work.

I'm sharing this because if you're a non-expert trying to run complex infrastructure, the tool you choose matters enormously. ChatGPT is phenomenal for learning concepts and brainstorming. But for actual infrastructure operations? You need something that can execute, verify, and iterate.

Alright, with that context: here's how I set up a Gonka.ai node earning rewards on the H100 GPU in my home lab.

What is Gonka.ai?

Quick context: Gonka is a decentralized AI inference network. Instead of running AI models on centralized cloud providers (OpenAI, Anthropic, Google), Gonka distributes inference work across a network of independent GPU nodes.

How you earn:

My setup:

Prerequisites & Planning

Before starting, I verified I had what I needed. The official docs (gonka.ai/host/quickstart) list requirements, but here's what actually mattered:

Hardware Requirements (What I Had)

# Checked my GPU

nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv

# Output:

# H100 PCIe, 81920 MiB, 580.95.05

GPU: H100 PCIe with 81GB VRAM 

Driver: 580.95.05 (minimum: 535+) 

CUDA: 13.0 (minimum: 12.6) 

Disk: 3.3TB available (you need 500GB minimum for blockchain + model) 

Network: Public IP with ports 22, 5000, 26657, 8000 accessible

Software Prerequisites

docker --version        # 24.0.7

docker compose version  # 2.23.0

Mental Prerequisites

The docs say: "Setup time: ~2 hours"

Actual time with Claude Code assistance: 8 hours (including debugging, learning, dashboard building)

Estimated time if I'd done it manually with ChatGPT: 16-20 hours, or I would've given up

Key mindset: You're going to hit problems. The difference is whether you have a tool that can actually help you solve them.

The Biggest Early Mistake (That I Avoided)

The official quickstart examples show configurations for 4-GPU setups running the massive Qwen3-235B model. If you blindly follow those examples with a single GPU, you'll spend hours wondering why nothing loads.

Spoiler: You need the single-GPU configuration with the smaller Qwen3-32B model. More on this in the "Model Configuration" section.

Phase 1: Local Setup (The Easy Part)

This went smoothly because Claude Code handled all the file management and validation.

1. Account Creation

Downloaded the Gonka CLI and created my account:

cd ~/gonka-setup

./bin/inferenced keys add my-account-key

Critical moment: The CLI generates a 24-word mnemonic phrase. This is your master password. Lose this = lose access forever. No recovery.

I saved it to:

Claude Code helped here: Created the secure directory structure and reminded me to back up before proceeding.

2. ML Operations Key

For operational security, you need a separate key that the node uses (not your main account key):

./bin/inferenced keys add ml-ops-key

Same security applies—24-word phrase, store it safely.

3. Hugging Face Token

To download AI models, you need a Hugging Face token:

4. Directory Structure

Claude Code set this up automatically:

gonka-setup/

├── bin/              # CLI tools

├── keys/             # 🔐 CRITICAL: All secrets here

│   ├── mnemonic.txt

│   ├── ml-ops-mnemonic.txt

│   ├── huggingface-token.txt

│   └── keyring-password.txt

├── configs/          # Configuration backups

└── logs/             # You'll need these for debugging

Lesson: Organization saves hours. When things break at hour 6, you'll be grateful for clean logs and backed-up configs.

Phase 2: Server Setup (Where Things Got Real)

Now we SSH into the actual H100 server to prepare the environment.

Firewall Configuration (Critical Security)

Several Gonka ports must be blocked from the internet to prevent exploitation. This was one area where Claude Code's execution capability was essential.

Allowed (public access):

sudo ufw allow 22/tcp      # SSH

sudo ufw allow 5000/tcp    # P2P networking

sudo ufw allow 26657/tcp   # Blockchain RPC

sudo ufw allow 8000/tcp    # Public API

Blocked (internal services only):

sudo ufw deny 9100/tcp     # Prometheus metrics

sudo ufw deny 9200/tcp     # Internal monitoring

sudo ufw deny 8080/tcp     # Internal proxy

sudo ufw deny 5050/tcp     # Inference endpoint

sudo ufw enable

Why this matters: I initially had all ports open (rookie mistake). Claude Code caught this during a security review and locked it down before going live.

The .env vs config.env Problem

This one would've stumped me for an hour without execution help.

The Gonka setup uses a config.env file with this format:

export KEY_NAME=my-key

export ACCOUNT_ADDRESS=gonka1...

Problem: Docker Compose doesn't understand export statements. You need a plain .env file:

KEY_NAME=my-key

ACCOUNT_ADDRESS=gonka1...

How ChatGPT would handle this:

How Claude Code handled this:

# Automatically converted the file

sed 's/^export //' config.env > .env

# Validated syntax

docker compose config

# Confirmed: "Configuration valid ✓"

Time saved: ~20 minutes of trial-and-error

Phase 3: Model Configuration (The Hiccup)

Not every mistake needs to be dramatic—sometimes you just miss a line in the docs.

The Initial Config (Following Official Docs)

I cloned the Gonka repository and looked at node-config.json:

{

  "id": "h100-node1",

  "host": "inference",

  "models": {

    "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8": {

      "args": [

        "--tensor-parallel-size=4",

        "--max-model-len=40960"

      ]

    }

  }

}

My thinking:

The Reality Check

After 6 hours of setup, node synced perfectly, but:

nvidia-smi

# GPU Memory: 0 MiB / 81,559 MiB

# Utilization: 0%

Nothing loaded. ML node logs showed:

ERROR: The number of required GPUs exceeds the total number of available GPUs

Available GPUs: 1

Required GPUs: 4 (tensor-parallel-size: 4)

Waiting for 3 additional GPUs...

The math:

The Fix: Single-GPU Configuration

The docs DO have a single-GPU configuration—I just missed it because the multi-GPU example is more prominent.

Correct config for single H100:

{

  "id": "h100-node1",

  "host": "inference",

  "models": {

    "Qwen/Qwen3-32B-FP8": {

      "args": []  // Empty! Let vLLM auto-configure

    }

  }

}

The fix process:

# Stop everything

docker compose down

# Delete configuration cache (CRITICAL!)

sudo rm -rf .dapi

# Update node-config.json with 32B model

# (Claude Code edited this directly)

# Download correct model (happens automatically on restart)

docker compose -f docker-compose.yml -f docker-compose.mlnode.yml up -d

# Wait 5 minutes for model download and loading...

# Success!

nvidia-smi

# GPU Memory: 73,223 MiB / 81,559 MiB ✅

# vLLM process running ✅

Key lessons:

Time lost: 3 hours debugging Knowledge gained: Deep understanding of model quantization, tensor parallelism, and VRAM allocation

Phase 4: State Sync Issues (The Database That Wasn't Empty)

After fixing the model, I restarted everything. Node stuck at block height 0.

The error:

failed to restore snapshot

error="multistore restore: import failed:

found database at version 1962000, must be 0"

Initial Misunderstanding

My first thought: "Database is corrupted!"

Reality: Database wasn't empty. State sync requires a completely clean slate.

The Actual Problem

Previous failed attempts left data in:

The database version mismatch (had 1962000, needed 0) wasn't corruption—just residual state.

The Solution (Nuclear Option)

# Stop services

docker compose down

# Clean slate

sudo rm -rf .inference .dapi .tmkms

# Start fresh

docker compose -f docker-compose.yml -f docker-compose.mlnode.yml up -d

# Monitor sync progress

watch -n 5 'curl -s localhost:26657/status | grep latest_block_height'

The Sync Process

State sync stages:

Time: ~30 minutes to full sync (reached block 1,960,000+)

Lesson: When documentation says "fresh install," they mean FRESH. Don't try incremental debugging with old state. 30 minutes of re-sync beats 3 hours of debugging corrupted state.

Phase 5: Validation & Monitoring (Building a Dashboard with AI)

Proof of Life

First thing after sync: test if inference actually works.

curl -X POST http://localhost:5050/v1/chat/completions \

  -H "Content-Type: application/json" \

  -d '{

    "model": "Qwen/Qwen3-32B-FP8",

    "messages": [{"role": "user", "content": "Hello!"}],

    "max_tokens": 50

  }'

# Response time: ~2 seconds

# GPU spiked to 100% utilization ✅

# Got coherent AI response ✅

It's alive!

Validation Activity

ML node logs showed the node doing its job:

Stats: 76 validated, 1 fraud

fraud_detected=False

p_honest=1.000000

What this means:

The Monitoring Problem

After getting everything working, a new problem emerged: visibility.

The manual way (what I started with):

Checking node health required 5+ SSH commands:

# Is the node synced?

ssh user@server "curl -s localhost:26657/status | jq '.result.sync_info'"

# How's the GPU?

ssh user@server "nvidia-smi"

# Are services running?

ssh user@server "cd gonka/deploy/join && docker compose ps"

# What's my balance?

curl http://server:8000/v1/participants/gonka1547... | jq '.balance'

# Validation count?

ssh user@server "docker compose logs mlnode-308 | grep 'Stats:' | tail -1"

Time to check everything: ~5 minutes How often I checked: Every 30 minutes (paranoia about issues) Daily time wasted: ~2.5 hours

This was unsustainable.

Building the Dashboard: ChatGPT vs Claude Code (The Showdown)

I needed a custom real-time monitoring dashboard. Requirements:

✅ Single command to run

✅ Real-time auto-refresh

✅ Beautiful terminal UI

✅ All metrics in one view

✅ Color-coded health status

✅ Historical GPU utilization tracking

Tech stack:

Attempt 1: ChatGPT for Dashboard (The Failure)

Here's what actually happened when I tried ChatGPT first:

The core problem: ChatGPT can't see my environment, doesn't know I have uv installed, can't run commands to verify if anything works.

Attempt 2: Claude Code for Dashboard (The Success)

Total time: 15 minutes for initial working version

The Key Differences (For Non-Technical Users)

Claude vs ChatGPT

Claude vs ChatGPT

Real Example: The Peer Count Bug

A few hours after the dashboard was running, I noticed something odd.

With ChatGPT (hypothetical):

Me: "Dashboard shows 1 peer but I expected more"

ChatGPT: "The issue might be in how you're querying the RPC endpoint.

Try using this code instead: [provides 300 lines of example code]"

Me: [Reads code, tries to figure out which part to change]

Me: [Edits wrong file, still shows 1]

ChatGPT: "Also check your network configuration..."

Me: [30 minutes of trial and error]

With Claude Code (what actually happened):

Me: "Dashboard shows 1 peer but I expected more"

Claude Code:

Total time: 3 minutes

This happened multiple times during development. Each fix: minutes instead of hours.

Development Time Comparison

Traditional manual approach (estimated):

With Claude Code (actual):

Time saved: 15-20 hours

How Claude Code Helped (Beyond Just Speed)

1. Instant Project Structure

2. SSH Integration Done Right

3. Rich Library Expertise

4. Real-Time Debugging

The Dashboard Architecture

# dashboard.py - Entry point

# - CLI argument parsing

# - Main loop with keyboard input handling (q to quit, r to refresh)

# fetchers.py - Data collection

# - fetch_node_status() → RPC endpoint for blockchain data

# - fetch_participant_info() → API for balance/rewards

# - fetch_docker_status() → SSH to check services

# - fetch_gpu_status() → nvidia-smi via SSH

# - fetch_validation_stats() → Parse ML node logs

# - fetch_today_utilization_stats() → Historical tracking

# display.py - Rich UI rendering

# - create_node_panel() → Blockchain sync status

# - create_participant_panel() → Account info

# - create_services_panel() → Docker containers (2-column grid)

# - create_gpu_panel() → GPU metrics + validations

# - create_system_panel() → Disk, uptime, model status

# - create_layout() → Combine all panels

# config.py - Configuration

# - Server SSH details

# - API endpoints

# - Refresh intervals

# - Health thresholds (temp, disk space)

# utilization_tracker.py - Historical data

# - Records GPU usage every 5 seconds

# - Calculates daily active time (>0% utilization)

# - Stores in local JSON file

Key Features Implemented

1. Real-Time Sync Status

2. GPU Monitoring

3. Validation Tracking

4. Service Health

5. Historical Analytics

The Result

uv run dashboard.py

gonka-node-dashboard

gonka-node-dashboard

[Press 'q' to quit, 'r' to refresh] Refreshing in 5s...

Impact

Before Dashboard:

After Dashboard:

Development time saved by Claude Code: ~15-20 hours

"Claude Code turned a 4-day dashboard project into a 4-hour sprint. The best part? It didn't just write code—it taught me the Rich library patterns while building. I learned by collaborating, not just copy-pasting."

The "Working vs Earning" Mystery

Dashboard running beautifully. All metrics green. Then I noticed something:

Current Status (After 26 hours operational):

Everything looks perfect... except:

Is the node working? Yes. Is it earning? Who knows.

This becomes important in understanding how the network actually operates...

Understanding the Gonka Ecosystem

With the dashboard revealing the "working but not earning" mystery, I needed to understand how the network actually functions.

Network Overview

Total Participants: 4,494 nodes Active Earners: 1,867 nodes (41.5% of network) My Position: Connected to 21 peers (0.47% of network) Total Network Balance: 19.15 billion GONKA

Is 21 Peers Enough?

When I first saw "21 peers" I worried: "Shouldn't I have more connections to a 4,494-node network?"

Answer: No. Here's why 21 peers is actually perfect:

The Incentive Model (Proof of Compute 2.0)

This isn't just about running inference. The economic model is more nuanced:

1. You Earn for Computational Work

2. Epoch System

3. What Affects Your Earnings

Why Zero Balance After 26 Hours?

Reason 1: Epoch Timing

Reason 2: Wrong Model During Epoch 126

Expected first rewards: End of Epoch 127 (in ~19 hours from current time)

Why No Validation Activity for 14 Hours?

This one's still a mystery. Possibilities:

Status: Monitoring to see if activity resumes...

Lessons Learned

After 8 hours of setup, 26 hours of operation, and building a custom dashboard, here's what I wish I'd known from the start.

Technical Lessons

1. Use the Right Tool for Infrastructure Work

This is the #1 lesson.

Rule of thumb: If you're editing files and running commands, use a tool that can do both.

2. RTFM, But Verify

3. Model Selection is Critical

Calculate VRAM requirements FIRST:

Formula: (Parameters × bytes-per-param) + KV cache

Example:

Bigger model ≠ more earnings. Right-sized model = reliability.

4. Configuration Caching Will Bite You

Hidden state in .dapi, .inference directories means config changes get ignored if cache exists.

Solution: When changing major configs, delete cache directories and restart fresh.

Time saved: Nuclear option (delete all) takes 30 minutes. Debugging cached config takes 3 hours.

5. State Sync Requires Clean Slate

"Fresh install" means ZERO residual data. Don't try to debug with old state hanging around.

30 minutes of re-sync < 3 hours of debugging corrupted state.

6. Monitoring is Essential

Operational Lessons

1. Security First

2. Backup Everything Critical

What NOT to backup: Server data (blockchain will re-sync, no backup needed)

3. Time Estimation

4. Community Resources

Economics Lessons

1. Rewards Take Time

2. Network Participation is Probabilistic

3. Hardware Investment

What I'd Do Differently

Keep Doing:

Change:

"The best debugging tool? A clean slate and fresh eyes. The second best? Detailed logs from when things worked. The third best? An AI assistant that can actually execute commands and verify results."

Current Status & Next Steps

Node Health (As of 26 Hours Operational)

✅ Blockchain: Synced to block 1,979,450

✅ Network: 21 healthy peers

✅ GPU: 71.5GB model loaded, ready

✅ Services: 7/8 running (bridge stopped, not critical)

✅ Validations: 76 batches completed

⏳ Rewards: First distribution in ~19 hours (end of Epoch 127)

What's Working

What's Not (Yet)

Immediate Next Steps

1. Monitor First Rewards (in ~19 hours)

2. Investigate Validation Silence

3. Bridge Service (lower priority)

4. Optimization

Long-Term Goals

The Bigger Picture

This isn't just about running a node – it's about:

Community: Building in public, sharing learnings

Conclusion

Was It Worth It?

The honest answer: Ask me in 90 days when I have earnings data.

What made it worth it already:

Technical Learning

The Right Tools Matter

Problem-Solving Skills

Community Contribution

Unanswered Questions

❓ Actual earning potential (waiting for data) ❓ Long-term stability and uptime requirements ❓ Network growth impact on individual earnings ❓ Future model updates and compatibility

Who Should Run a Gonka Node?

Good Fit (With Claude Code or similar AI assistant):

Good Fit (Without AI Assistance):

Not Ready:

Parting Wisdom

"Setting up a Gonka node isn't a 2-hour tutorial. It's an 8-hour debugging session that teaches you more about distributed AI infrastructure than any course could. But here's the thing: with Claude Code, those 8 hours are collaborative pair programming, not solo frustration. Come prepared to learn, not just to earn—and bring the right AI tools to the party."

Call to Action

Final Stats

Appendices

Appendix A: Command Reference

Sanitized examples of key commands used:

# Prerequisites check

nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv

docker --version

docker compose version

# Firewall setup

sudo ufw allow 22/tcp

sudo ufw allow 5000/tcp

sudo ufw allow 26657/tcp

sudo ufw allow 8000/tcp

sudo ufw deny 9100/tcp

sudo ufw deny 9200/tcp

sudo ufw enable

# Node setup

git clone https://github.com/gonka-ai/gonka.git

cd gonka/deploy/join

# Edit .env and node-config.json

docker compose -f docker-compose.yml -f docker-compose.mlnode.yml up -d

# Monitoring

curl -s localhost:26657/status | jq '.result.sync_info'

docker compose logs mlnode-308 --tail=50

nvidia-smi

docker compose ps

# Dashboard (if you build it)

cd ~/gonka-setup

uv run dashboard.py

Appendix B: Troubleshooting Quick Reference

symptoms-vs-solutions

symptoms-vs-solutions

Appendix C: Resources & Links

Official:

Tools:

Learning:


Node Status Timeline

  Setup Date: January 1-2, 2026

  Initial Status: ✅ Operational & Synced (8/8 services)

  First Rewards: Pending (Epoch 127 completion)

  ---

  First Earnings - January 4, 2026 (Day 3)

  - Epoch 127 Completed: ✅ 11,289 GONKA earned (28 coins)

  - Status: First rewards successfully vested

  - Conversion Rate: ~403 GONKA per coin

  - Services: 7/8 running (bridge intentionally stopped)

---

  Interim Update - January 7, 2026 (Day 6)

  - Current Epoch: 131 (20% complete)

  - Epochs Completed: 127-130 (4 epochs)

  - Total Earnings: 79,946 GONKA

    - Vesting: 79,490 GONKA (unlocks over ~198 days)

    - Liquid Balance: 456 GONKA

  - Recent Performance:

    - E130: 10,880 coins → 32,681 GONKA (3.00 GONKA/coin)

    - E129: 3,700 coins → 19,520 GONKA (5.28 GONKA/coin)

    - E128: 1,300 coins → 16,000 GONKA (12.31 GONKA/coin)

  - Work Activity: 78 validations completed, 3 fraud detections

  - Network Position: 1,867 / 4,494 active participants

---

  Latest Update - January 10, 2026 (Day 9)

  - Current Epoch: 132 (36.5% complete)

  - Epochs Completed: 127-131 (5 epochs)

  - Total Earnings: 99,455 GONKA (+24% in 3 days)

    - Vesting: 98,555 GONKA

    - Liquid Balance: 900 GONKA (+97%)

  - Epoch 131 Performance: 129 coins → ~20,468 GONKA (158.67 GONKA/coin) 🚀

    - 53x better conversion rate than Epoch 130!

  - Work Activity: 482 total validations (+404 in 3 days)

  - Network Position: 3,403 / 4,933 active participants

  - Vesting Schedule: ~401 GONKA unlocking per day


Note: This blog post contains no sponsored content. All opinions are based on actual experience. Some details (IP addresses, account addresses) have been sanitized for security.

Olena Tkhorovska

Olena Tkhorovska

CEO + Co-Founder