ARR-TC-2026-006·Technical Commentary·2026-01-12

Hedging under Frictions as Convex Risk Minimization over Neural Policies

· deep hedging· transaction costs· convex risk measures· reinforcement learning
§ Reviewed Work
Deep Hedging
H. Buehler, L. Gonon, J. Teichmann, B. Wood
arXiv:1802.03042 · Quantitative Finance 19(8), 2019
View source ↗
§01

Abstract

The paper frames hedging of a derivatives book under frictions — transaction costs, liquidity limits, trading constraints — as minimization of a convex risk measure over policies parameterized by neural networks, with a representation result that the admissible strategy class can ε-approximate the optimum. We read it as the methodological template for optimizing decisions under realistic constraints without a closed-form model, and note the governance burden that template carries.

§02

Notation / Conceptual Frame

Minimize ρ( −Z + Σ_k δ_k · ΔS_k − cost ) over network policies δ_k = δ_k(state), where ρ is a convex risk measure (e.g. entropic, CVaR); the price is the indifference value under ρ.

§03

Commentary

The substantive shift is from compute-the-greeks-then-hedge to optimize-the-terminal-P&L-distribution directly under the measure you actually care about. For a research desk the caution is symmetric: the method is only as disciplined as the risk measure, the cost model, and the training distribution chosen for it.

§04

Implications for Research Methodology

Reinforces the desk preference for stating the objective and constraints explicitly before optimizing, and for treating any learned policy as conditional on its training regime — i.e. subject to invalidation when the regime shifts.

§05

Limitations

Performance is contingent on the simulator and training distribution; out-of-distribution regimes are exactly where the policy is least reliable and hardest to audit.

§ Related Notes
This note is informational and interpretive. It does not constitute personalized investment advice. Market activity involves risk. Historical analysis and model outputs do not guarantee future results.