Simpson’s Paradox: Aggregation Effects in Statistical and Machine Learning Models


  •  Michael Brimacombe    

Abstract

Data aggregation effects are examined in relation to both statistical and machine learning approaches to data modeling. It is shown that heavily data-centric artificial neural network and random forest methods are subject to aggregation effects similar to those affecting statistical methods. Several basic examples are discussed.



This work is licensed under a Creative Commons Attribution 4.0 License.