Three reinforcement-learning models, a meta-selector, and clear limitations. A full picture of what runs under the hood — and what does not.
Our execution engine consists of three reinforcement-learning models with different specializations. Each uses a PPO-based (Proximal Policy Optimization) architecture with an actor-critic setup. Models are not continuously retrained — we update weights quarterly after rigorous out-of-sample validation.
Our models process three categories of data:
We do not claim our models handle all market regimes equally well. We do not claim they anticipate black swan events. We do not claim past performance is indicative of future results. What we do claim: that disciplined algorithmic execution with clear risk limits produces better long-term outcomes than emotionally-driven discretionary decisions.
Algorithms are not oracles. In extreme, never-before-seen market conditions, our models will produce losses. That is the nature of the system — no model can be prepared for what it has never seen.
We use rigorous walk-forward validation. Models are trained on 80% of historical data and validated on the remaining 20%. We never use future data for training (no look-ahead bias). We account for full transaction costs including spreads, commissions, and realistic slippage. Models that fail to perform out-of-sample do not reach production.
Our research team is organizationally separated from business and marketing. The Head of Research reports directly to the Board, not to the CEO. When a model performs poorly, we say so openly in the monthly memo — we do not trim results for optics. This separation is enshrined in our internal regulations and reviewed annually by the Board.
We publish quarterly technical notes on our editorial section with details on recent model performance, methodological improvements, and lessons learned from recent market conditions. Notes are written without marketing language and accessible to technically interested subscribers.