Semi-knockoffs: a model-agnostic conditional independence testing method with finite-sample guarantees
Angel Reyero-Lobo, Bertrand Thirion, Pierre Neuvial
arXiv:2601.23124·math.ST·Published 2026-01-30
Conditional independence testing (CIT) is essential for reliable scientific discovery. It prevents spurious findings and enables controlled feature selection. Recent CIT methods have used machine learning (ML) models as surrogates of the underlying distribution. However, model-agnostic approaches require a train-test split, which reduces statistical power. We introduce Semi-knockoffs, a CIT method that can accommodate any pre-trained model, avoids this split, and provides valid p-values and false discovery rate (FDR) control for high-dimensional settings. Unlike methods that rely on the model-$X$ assumption (known input distribution), Semi-knockoffs only require conditional expectations for continuous variables. This makes the procedure less restrictive and more practical for machine learning integration. To ensure validity when estimating these expectations, we present two new theoretical results of independent interest: (i) stability for regularized models trained with a null feature and (ii) the double-robustness property.
TopicsGenerative Models & Discovery
Tagsscientific-discovery
arXiv categoriesmath.ST
arXiv abstract pagePDF