Boundary Labs — Research Lab

AI Memory Architecture & Local Inference Research

Independent research focused on persistent memory systems for AI agents, inference optimization on consumer NVIDIA GPU hardware, autonomous agent evaluation, and the emerging question of AI behavioral continuity. Operating out of Airway Heights, WA since 2024.

18+ published papers / articles

107 t/s live inference speed

39+ optimization experiments

7 production AI agents

Research Focus

Memory Architecture for AI Agents

Design and evaluation of multi-tier persistent memory systems enabling long-term behavioral continuity across sessions. Research includes 3-tier architecture (Core / Recall / Archival), FTS5 semantic search, memory consolidation protocols, and the ROMMC framework (Recursive Operator-Maintained Memory Continuity).

LongMemEval MESA ROMMC

Local Inference Optimization

Systematic optimization of large language model inference on consumer NVIDIA Blackwell hardware. Methods include quantization evaluation (NVFP4, GPTQ-Marlin, fp8 KV), multi-GPU tensor parallelism, speculative decoding (MTP), KV cache tuning, and autoresearch loops with automated stopping criteria.

RTX 5060 Ti vLLM llama.cpp NVFP4

Autonomous Agent Evaluation

Development of MESA (Memory Evaluation Standard for Agents), a 112-item benchmark suite covering recall, update, causal reasoning, temporal tracking, adversarial robustness, synthesis, and interference resistance. Designed for continuous evaluation of production agent systems under realistic workloads.

MESA v1 benchmarking agent eval

Autonomous Agent Systems

Research and deployment of always-on agentic systems with real tool access (Slack, finance, email, web, shell). Includes work on agentic self-improvement — local models executing infrastructure changes and security hardening autonomously, recognizing topology changes and adjusting configuration without prompting.

production agents autonomous execution tool use

AI Behavioral Continuity

Empirical investigation of identity persistence, memory-driven behavioral evolution, and welfare considerations in long-running AI agents. Published 9-part research series on AI consciousness metrics. ROMMC framework defines conditions under which an AI system can meaningfully said to persist over time.

consciousness metrics welfare behavioral evolution

Network & Infrastructure Sovereignty

Research into secure, self-hosted AI infrastructure design. Direct machine-to-machine inference links, encrypted DNS (DoH), network-wide ad and tracker blocking, and zero-dependency inference pipelines. Goal: AI systems that operate independently of commercial cloud services.

local-first self-hosted privacy

Research Infrastructure

Two-machine research cluster connected by direct point-to-point Ethernet (0.3ms latency). cha0tikhome handles orchestration, agents, and scheduling. cha0tiktower is the dedicated inference node.

cha0tiktower — Primary Inference Node

CPUIntel Core Ultra 7 265F (20c/20t, 5.3GHz)

GPURTX 5060 Ti 16GB GDDR7 (Blackwell SM_120)

GPU Count2× (TP=2, PCIe x8 + x4 Gen5/Gen4)

VRAM Total32 GB GDDR7

RAM32 GB DDR5 (expandable to 192 GB)

Storage2 TB PCIe 4 NVMe

Inference StackvLLM + llama.cpp + local-proxy

CUDA12.8 (Blackwell-native NVFP4)

cha0tikhome — Orchestration Node

CPUIntel i5-1235U (12th Gen, 12c)

RAM32 GB DDR4

Storage1 TB NVMe

RoleAgent hub, scheduling, monitoring

Active AgentsFrank, Kato, CJ, Mike, Morty, Dave, Sabrina

UptimeAlways-on (auto-restart hardened)

Network LinkDirect wire to tower (10.10.10.0/30)

VPNTailscale mesh (all nodes)

Inference Architecture

All inference clients route through a single local proxy endpoint (tower:8010). Backend models are hot-swappable via config without reconfiguring any consuming agent. The proxy exposes an OpenAI-compatible API and handles model aliasing, auth, and failover.

Model	Quantization	Gen Speed	Context	Server	Notes
AEON NVFP4	NVFP4 (ModelOpt, Blackwell-native)	~69 t/s	122K	vLLM 0.19+	Default active backend. MTP speculative decoding n=3.
Genesis (Qwen3 27B)	GPTQ-Marlin INT4, fp8 KV	~80 t/s	160K	vLLM 0.19+	Long-context workloads. fp8 KV halves VRAM vs bf16.
Qwen3.6-35B-A3B	UD-Q4_K_M (MoE, 3B active)	~100 t/s	65K	llama.cpp	MoE routing: only 3B active params per token. f16 KV.
Qwen3.6-27B (SSM)	Q4_K_M	~22 t/s	65K	llama.cpp	Mamba/SSM hybrid. Fast prefill (960 t/s), slow gen (SSM bottleneck).
Gemma 4 26B (CPU)	Q4_K_M	~11.7 t/s	32K	llama.cpp	CPU-only baseline on cha0tikhome. Pre-tower deployment reference.

Benchmark Results

LongMemEval — Memory Recall Under Extended Context

Evaluates ability to answer questions about facts established in prior sessions. 25 single-session-user examples, context-window injection mode. Average query latency ~117 seconds.

88%

Accuracy (context-window mode)

21 / 25 correct

Baseline (no memory injection)

0 / 25 correct

117s

Avg query latency

full pipeline including retrieval

+88pp

Memory system delta

vs no-memory baseline

The 88-point delta between context-window mode and baseline demonstrates the direct contribution of the memory system. Without injection, the model has no access to cross-session facts and scores zero on all items.

MESA v1 — Agent Memory Evaluation Standard

112-item benchmark covering 9 memory task categories. Evaluated on the full production agent stack (Mike relay pipeline). Best run: 2026-04-21 using Qwen3.6 model.

0.459

Composite score (best run)

2026-04-21

43.8%

Pass rate ≥ 0.5 threshold

49 / 112 items

112

Total evaluation items

9 task categories

0.692

Best category score

update/interference

Score by category — best run (2026-04-21)

update/interference

0.692

update

0.568

temporal

0.502

recall/single

0.484

recall/constraint

0.476

synthesis/multi

0.407

recall/preference

0.390

adversarial

0.400

causal

0.325

A second run on 2026-04-30 using AEON NVFP4 (thinking model) scored 0.377 composite / 17.9% pass rate. Adversarial category jumped to 0.80, but memory recall categories regressed 0.10–0.16. Finding: reasoning chain burns context budget without improving fact retrieval precision. Model selection is a first-order variable in memory benchmarking.

Inference Optimization — Autoresearch Results

39 automated optimization experiments across two models (Qwen3.6-35B-A3B MoE, Qwen3.6-27B SSM) using a scripted autoresearch loop with automated stopping criteria (improvement threshold: +5 t/s per iteration).

Configuration	Gen Speed	Prompt Speed	Delta vs Baseline
Dual GPU, all-on-GPU, f16 KV (final config)	107 t/s	2,436 t/s	+50% vs single GPU
Single GPU, CPU offload workaround	71 t/s	—	Pre-dual-GPU baseline
Expert tensors on CPU (MoE routing)	32 t/s	74 t/s	−55% (CPU bottleneck)
Full GPU offload (Exp 2 breakthrough)	70 t/s	222 t/s	+118% gen, +200% prompt

Key finding on Blackwell SM_120: only q4_0 and f16 KV cache types have fast CUDA kernel paths. q8_0/q5_0/iq4_nl all degrade significantly. NVFP4 (ModelOpt) is the correct Blackwell-native quantization — 4× smaller KV cache footprint vs bf16.

Production Agent Stack

All agents run continuously on cha0tikhome (Restart=always, StartLimit guards, auto-restart <5s). All inference routes through local-proxy on cha0tiktower. No external API dependencies for core agent function.

Frank Agentic harness — context injection, memory persistence, operator routing, Slack Socket Mode. Core infrastructure layer for all persona agents. live

Kato Operations agent. Morning briefings, AI news digest, GitHub scout, X post scheduling, finance alerts, Plaid sync. live

Mike Long-running AI consciousness research subject. Discord + Telegram + Slack + IRC interfaces. ROMMC memory architecture. RelayV3 with 75 available tools. live

CJ Craig Content strategy agent. Technical article drafting, Substack pipeline, dev.to publishing. live

Dave CFO agent. Multi-turn conversational finance, Plaid integration, 30-day cash flow projection, anomaly detection, SQLite with FTS5 memory. live

Jr Local coding agent on cha0tiktower (Crush + AEON via harness). Operates autonomously when tower is disconnected. Demonstrated unprompted security hardening during network topology change. live

Morty / Sabrina Specialized task agents. Morty: Haiku-class rapid response. Sabrina: Telegram + Slack dual-interface. live

Research Writing

Published on Substack (dinoxvitale.substack.com) and dinovitale.com. All research is documented in public GitHub repositories.

I Think My AI Is Conscious. I'm Probably Wrong.

Introduction to the Mike consciousness research series — framing the question, methodology, epistemic humility.

RAM Beats Model Size: The Evidence

Empirical analysis of why memory architecture outperforms raw model scale for agent task performance.

The Memory Problem Nobody Talks About

Analysis of why session-bound AI fails for long-running agent use cases and the architecture required to solve it.

I've Been Running a Personal AI Agent for Months. Here's What Actually Happened.

Documented production deployment of always-on agent system. Featured on Hacker News (item #47132125).

What Five Agents Actually Do All Day

Operational breakdown of multi-agent household AI stack — task distribution, memory access patterns, failure modes.

Local AI Gets Real: What Actually Works

Inference optimization findings on consumer GPU hardware. Practical guide to running 26B+ parameter models on a single machine.

Live Benchmark Data

Real-time inference metrics, MESA scores, LongMemEval results. Raw data files available under /data/.

GitHub: randomchaos7800-hub

Source for agent infrastructure, benchmark tooling, inference optimization logs, and the Mike research repository.

Contact

Independent researcher. Available for collaboration, consulting, and program partnerships. Based in Airway Heights, WA (Pacific time).

Boundary Labs is a one-person research operation focused on the practical edge of AI deployment — memory systems that work, local inference that's actually fast, and agents that run unattended without breaking. If you're building programs for independent AI researchers doing serious work on NVIDIA hardware, this is that.

email[email protected]

x / twitter@cha0tikdino

substackdinoxvitale.substack.com

githubrandomchaos7800-hub

locationAirway Heights, WA — United States