Estimating non-overfitted convex production technologies: a stochastic machine learning approach

Maria D. Guillen, Vincent Charles*, Juan Aparicio

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Overfitting is a classical statistical issue that occurs when a model fits a particular observed data sample too closely, potentially limiting its generalizability. While Data Envelopment Analysis (DEA) is a powerful non-parametric method for assessing the relative efficiency of decision-making units (DMUs), its reliance on the minimal extrapolation principle can lead to concerns about overfitting, particularly when the goal extends beyond evaluating the specific DMUs in the sample to making broader inferences. In this paper, we propose an adaptation of Stochastic Gradient Boosting to estimate production possibility sets that mitigate overfitting while satisfying shape constraints such as convexity and free disposability. Our approach is not intended to replace DEA but to complement it, offering an additional tool for scenarios where generalization is important. Through simulation experiments, we demonstrate that the proposed method performs well compared to DEA, especially in high-dimensional settings. Furthermore, the new machine learning-based technique is compared to the Corrected Concave Non-parametric Least Squares (C2NLS), showing competitive performance. We also illustrate how the usual efficiency measures in DEA can be implemented under our approach. Finally, we provide an empirical example based on data from the Program for International Student Assessment (PISA) to demonstrate the applicability of the new method.
Original languageEnglish
JournalEuropean Journal of Operational Research
Early online date28 Nov 2024
DOIs
Publication statusEarly online date - 28 Nov 2024

Keywords

  • Artificial Intelligence
  • Stochastic Gradient Boosting
  • Stochastic
  • Machine Learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Management Science and Operations Research

Fingerprint

Dive into the research topics of 'Estimating non-overfitted convex production technologies: a stochastic machine learning approach'. Together they form a unique fingerprint.

Cite this