← Back

Statistics and Probability

Mathematical frameworks for data analysis and uncertainty, including statistical theory, probability theory, stochastic processes, statistical inference, and applications to data science

9 papers

Papers

Statistical exponential families: A digest with flash cards*

This document serves as a concise digest of exponential family distributions, detailing their definitions, fundamental properties, and duality with Bregman divergences. It also reviews the Riemannian and information geometries associated with these statistical manifolds. The paper provides "flash cards" summarizing the properties and parameterizations of common exponential family distributions like Gaussian, Poisson, and Binomial.

Statistics and Probability Oct 08, 06:26 AM

On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means

This paper introduces a generalization of the Jensen-Shannon Divergence (JSD), a method used to measure the difference between probability distributions. It explores using different types of "means" (like arithmetic, geometric, and harmonic means) to create new JSD variations and provides closed-form formulas for these variations in specific cases like mixtures of Gaussian or Cauchy distributions.

Statistics and Probability Sep 17, 06:26 PM

Modern aspects of Markov chains: entropy, curvature and the cutoff phenomenon

This lecture note explores the "cutoff phenomenon" in Markov chains, where some processes abruptly transition to equilibrium instead of gradually converging. It introduces key concepts like mixing times, curvature, and varentropy, applying them to various models like card shuffling and random walks. The author presents a novel criterion based on varentropy for predicting cutoff behavior.

Statistics and Probability Aug 29, 02:52 PM

ADDRESSING OUTLIERS IN MIXED-EFFECTS LOGISTIC REGRESSION: A MORE ROBUST MODELING APPROACH

This study proposes the 'binomial-logit-t' model to improve analysis of bounded count data (data with a maximum value), particularly in scenarios with outliers like medication adherence. It handles outliers and accounts for overdispersion more effectively compared to existing methods, providing more accurate parameter estimates. The model is demonstrated on a medication adherence dataset and supported by simulations.

Statistics and Probability Aug 26, 11:50 AM

A FRAMEWORK FOR THINKING ABOUT INFORMAL STATISTICAL INFERENCE

The study proposes a framework for informal statistical inference with three key principles: generalizations beyond the data, data as evidence, and probabilistic language. Classroom episodes illustrate how teachers can foster inferential reasoning by emphasizing purposeful investigations, connecting conclusions to evidence, and promoting probabilistic language to express uncertainty.

Statistics and Probability Jul 14, 11:11 AM

The Exponentiated Generalized Class of Distributions

The paper proposes the "exponentiated generalized" (EG) class of distributions, a new method of adding two shape parameters to existing continuous distributions using a double Lehmann alternative construction. This extends the flexibility of distributions, particularly in the tails, enabling improved modeling in various fields. The study explores mathematical properties, including moments, generating functions, and order statistics, and demonstrates applications to real datasets from diverse areas like agriculture and material science, finding superior fits compared to existing models.

Statistics and Probability Jul 14, 11:11 AM

The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation

This study argues that R-squared is a more informative and robust metric for evaluating regression models compared to SMAPE, MAE, MAPE, MSE, and RMSE. Through several synthetic use cases and analysis of two real medical datasets, the authors demonstrate that R-squared provides a more accurate assessment of model performance, particularly when dealing with skewed data or outliers. They propose using R-squared as the standard metric for regression analysis evaluation.

Statistics and Probability Jul 14, 11:11 AM