intltechventures.blogspot.com: 2018-11-20 Tuesday - Machine Learning Technical Debt and Anti-Patterns

2018-11-20

2018-11-20 Tuesday - Machine Learning Technical Debt and Anti-Patterns

This posting is a placeholder for links to interesting articles/papers that touch on technical debit and anti-patterns in Machine Learning algorithms/models/solutions.

Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

"Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns."

A Convex Framework for Fair Regression

https://arxiv.org/abs/1706.02409

https://arxiv.org/pdf/1706.02409.pdf

"The widespread use of machine learning to make consequential decisions about individual citizens (including in domains such as...criminal sentencing has been accompanied by increased reports of instances in which the algorithms and models employed can be unfair or discriminatory in a variety of ways ...we introduce a rich family of fairness metrics for regression models that take the form of a fairness regularizer and apply them to the standard loss functions for linear and logistic regression."

"Since these loss functions and our fairness regularizer are convex, the combined objective functions obtained from our framework are also convex, and thus permit efficient optimization. Furthermore, our family of fairness metrics covers the spectrum from the type of group fairness that is common in classification formulations (where e.g. false arrests in one racial group can be “compensated” for by false arrests in another racial group) to much stronger notions of individual fairness (where such cancellations are forbidden, and every injustice is charged to the model). Intermediate fairness notions are also covered. Our framework also permits one to either forbid the use of a "protected” variable (such as race), by demanding that a single model be learned across all groups, or to build different group-dependent models."

"Most importantly, by varying the weight on the fairness regularizer, our framework permits us to compute the entire “Pareto curve” or efficient frontier of the trade-off between predictive accuracy and fairness. Such curves are especially important to examine and understand in a domain-specific manner: since demanding fairness of models will always come at a cost of reduced predictive accuracy , it behooves practitioners working with fairness-sensitive data sets to understand just how mild or severe this trade-off is in their particular arena, permitting them to make informed modeling and policy decisions""...in this work we have studied a variety of fairness regularizers for regression problems, and applied them to data sets in which fairness is not subservient to generalization, but is instead a first-order consideration. Our empirical study has demonstrated that the choice of fairness regularizer (group, individual, hybrid, or other) and the particular data set can have qualitative effects on the trade-off between accuracy and fairness...."

"The Communities and Crime dataset, from the UCI repository is a dataset which includes many features deemed relevant to violent crime rates (such as the percentage of the community’s population in an urban area, the community’s racial makeup, law enforcement involvement and racial makeup of that law enforcement in a community, amount a community’s law enforcement allocated to drug units) for different communities. This data is provided to train regression models based on this data to predict the amount of violent crime (murder, rape, robbery, and assault) in a given community...."

"The COMPAS dataset The COMPAS dataset contains data from Broward County, Florida originally compiled by ProPublica in which the goal is to predict whether a convicted individual would commit a violent crime in the following two years or not. ..."

intltechventures.blogspot.com

2018-11-20

2018-11-20 Tuesday - Machine Learning Technical Debt and Anti-Patterns

No comments:

Copyright