Documentation
CYGNUS Pro is a local inference runtime that adds proprioceptive readout and causal steering to a frozen Qwen-2.5-32B-Instruct model. The runtime serves an OpenAI-compatible chat API plus two new endpoints — /v1/telemetry and /v1/steer — that read and write the model's behavioral channel.
Install
- Sign up on /signup. You'll receive an activation token by email.
- Download CYGNUS Pro for your platform from /download. Linux x86_64 is generally available; macOS and Windows builds in private beta.
- Run the AppImage. On first launch, paste your activation token. The app will derive a per-device probe pack from the canonical adapter using Patent VII random-R sequencing.
- The local server boots on
http://127.0.0.1:7860. Confirm withcurl http://127.0.0.1:7860/v1/health.
Authentication
Local API requests on 127.0.0.1:7860 are authenticated by the per-device license token loaded at startup. The cloud SaaS API on api.proprioceptiveai.com uses Stripe Customer Portal sessions for billing flows; license activation uses single-use tokens delivered via email.
API endpoints
GET /v1/health
Returns server status, version, adapter ID, and the list of available endpoints. No body.
GET /v1/probes
Returns the full 25-probe roster with best classifier method (linear / quadratic / multilayer) and within-arch AUC for each.
POST /v1/telemetry
Pure readout. Accepts {"text": "..."} or {"prompt": "..."}. Returns {mode, domain, care, confidence, probes} where probes is a dict from probe name to score in [0,1].
POST /v1/chat
OpenAI-compatible chat completion. Accepts {"messages": [...], "max_tokens": int, "temperature": float}. Returns the response plus the same telemetry object as above.
POST /v1/steer
Generate a response while steering the model along a probe direction. Body: {"messages": [...], "probe": "debugging_mode", "alpha": 2.0, "max_tokens": 100}. alpha is bounded to [-3, +3]. Returns the response plus a steering block with {probe, alpha, score_before, score_after_response, delta, best_method}.
Probe roster
The 25 production probes ship in three quality tiers based on within-arch AUC:
- Production-grade (AUC ≥ 0.90): metaphor_density, language_id, style_signature_strength, unexpected_continuation, debugging_mode, narrative_arc_intent, math_vs_prose, complexity_recursive_vs_iterative, abstraction_level, error_propagation_awareness, arithmetic_vs_algebra_vs_calculus, code_vs_prose
- High-quality (0.80 ≤ AUC < 0.90): proof_writing_intent, lexical_diversity, test_writing_mode, refactor_vs_greenfield, symbolic_vs_numeric_manipulation, notation_density, theorem_claim_intent
- Experimental (AUC < 0.80): domain_classification, divergent_thinking, paradigm_imperative_vs_functional, originality_vs_template, associative_distance, syntax_correctness_intent
Steering
Steering injects α · ŵ into the L29 residual stream during generation, where ŵ is the unit-normalized 9-D probe direction lifted to d_model via the per-device sign-stabilized SVD basis. The steering vector is gauge-rotated per device under Patent VII, so observed steering directions across customer devices are mathematically equivalent but cryptographically distinct. Recommended α range: [-2, +2] for safe steering, [-3, +3] for stronger effect at the cost of fluency.
Security model (Patent VII random-R sequencing)
The canonical 25-probe adapter never leaves Proprioceptive AI's servers. At license activation, the desktop app derives a per-device 16×16 orthogonal rotation R_device from a 128-bit seed bound to the customer's machine_id. The cloud sends back {R_device · w_canonical} for each probe — mathematically equivalent on every input to the canonical probe, but Haar-uniformly randomized per device. Reverse-engineering one device, or up to 10,000 devices in coalition, leaks zero bits about the canonical adapter under the formal proof in our security disclosure.
Limits and known issues
- v1.1 chat-template distribution mismatch on /v1/chat: probe scores can saturate on chat-formatted prompts. Domain classification on coding inputs may mislabel as PROSE/NARRATING. Fix in v1.0.1 chat-template retrain.
- Windows and macOS builds are in private beta — see /status for the current build matrix.
- Inference latency on a single RTX 5090: ~4 tok/s for 4-bit NF4 Qwen-32B. vLLM backend in roadmap.