Documentation

CYGNUS Pro is a local inference runtime that adds proprioceptive readout and causal steering to a frozen Qwen-2.5-32B-Instruct model. The runtime serves an OpenAI-compatible chat API plus two new endpoints — /v1/telemetry and /v1/steer — that read and write the model's behavioral channel.

Contents

Install
Authentication
API endpoints
Probe roster
Steering
Security model (Patent VII)
Limits and known issues

Install

Sign up on /signup. You'll receive an activation token by email.
Download CYGNUS Pro for your platform from /download. Linux x86_64 is generally available; macOS and Windows builds in private beta.
Run the AppImage. On first launch, paste your activation token. The app will derive a per-device probe pack from the canonical adapter using Patent VII random-R sequencing.
The local server boots on http://127.0.0.1:7860. Confirm with curl http://127.0.0.1:7860/v1/health.

Authentication

Local API requests on 127.0.0.1:7860 are authenticated by the per-device license token loaded at startup. The cloud SaaS API on api.proprioceptiveai.com uses Stripe Customer Portal sessions for billing flows; license activation uses single-use tokens delivered via email.

API endpoints

GET /v1/health

Returns server status, version, adapter ID, and the list of available endpoints. No body.

GET /v1/probes

Returns the full 25-probe roster with best classifier method (linear / quadratic / multilayer) and within-arch AUC for each.

POST /v1/telemetry

Pure readout. Accepts {"text": "..."} or {"prompt": "..."}. Returns {mode, domain, care, confidence, probes} where probes is a dict from probe name to score in [0,1].

POST /v1/chat

OpenAI-compatible chat completion. Accepts {"messages": [...], "max_tokens": int, "temperature": float}. Returns the response plus the same telemetry object as above.

POST /v1/steer

Generate a response while steering the model along a probe direction. Body: {"messages": [...], "probe": "debugging_mode", "alpha": 2.0, "max_tokens": 100}. alpha is bounded to [-3, +3]. Returns the response plus a steering block with {probe, alpha, score_before, score_after_response, delta, best_method}.

Probe roster

The 25 production probes ship in three quality tiers based on within-arch AUC:

Production-grade (AUC ≥ 0.90): metaphor_density, language_id, style_signature_strength, unexpected_continuation, debugging_mode, narrative_arc_intent, math_vs_prose, complexity_recursive_vs_iterative, abstraction_level, error_propagation_awareness, arithmetic_vs_algebra_vs_calculus, code_vs_prose
High-quality (0.80 ≤ AUC < 0.90): proof_writing_intent, lexical_diversity, test_writing_mode, refactor_vs_greenfield, symbolic_vs_numeric_manipulation, notation_density, theorem_claim_intent
Experimental (AUC < 0.80): domain_classification, divergent_thinking, paradigm_imperative_vs_functional, originality_vs_template, associative_distance, syntax_correctness_intent

Steering

Steering injects α · ŵ into the L29 residual stream during generation, where ŵ is the unit-normalized 9-D probe direction lifted to d_model via the per-device sign-stabilized SVD basis. The steering vector is gauge-rotated per device under Patent VII, so observed steering directions across customer devices are mathematically equivalent but cryptographically distinct. Recommended α range: [-2, +2] for safe steering, [-3, +3] for stronger effect at the cost of fluency.

Security model (Patent VII random-R sequencing)

The canonical 25-probe adapter never leaves Proprioceptive AI's servers. At license activation, the desktop app derives a per-device 16×16 orthogonal rotation R_device from a 128-bit seed bound to the customer's machine_id. The cloud sends back {R_device · w_canonical} for each probe — mathematically equivalent on every input to the canonical probe, but Haar-uniformly randomized per device. Reverse-engineering one device, or up to 10,000 devices in coalition, leaks zero bits about the canonical adapter under the formal proof in our security disclosure.

Limits and known issues

v1.1 chat-template distribution mismatch on /v1/chat: probe scores can saturate on chat-formatted prompts. Domain classification on coding inputs may mislabel as PROSE/NARRATING. Fix in v1.0.1 chat-template retrain.
Windows and macOS builds are in private beta — see /status for the current build matrix.
Inference latency on a single RTX 5090: ~4 tok/s for 4-bit NF4 Qwen-32B. vLLM backend in roadmap.