Advanced Statistics for Physics

Advanced Statistics for Physics - Course program

Preliminary version of the course program. As the course progresses, the program becomes final and the font color changes from gray to black.

lesson	date	lesson topics	total time
Part 1: Probability theory and probability models
1	25/9/2023	Introduction to the course. Basic combinatorics and probability. Set theory representation of probability space. Addition law for probabilities of incompatible events. Addition law for probabilities of non-mutually exclusive events: the inclusion-exclusion principle.	2
2	29/9/2023	Application of the inclusion-exclusion principle. Conditional probabilities. Independent events. Statistical independence and dimensional reduction. Bayes' theorem and basic introduction to Bayesian inference. Elementary example of Bayesian inference.	4
3	2/10/2023	Conditional probabilities in stochastic modeling: gambler's ruin and probabilistic diffusion models. Discrete and continuous forms of gambler's ruin. Counting problems in particle physics. Introduction to random variables, basic definitions for discrete and continuous random variables. Uniform distribution.	6
4	6/10/2023	Buffon's needle. Expectation, dispersion and their properties. Higher moments. The concept of convergence in probability theory. Chebyshev's inequality. From Chebyshev's inequality to the weak law of large numbers.	8
5	9/10/2023	The generalized Chebyshev's inequality. Strong law of large numbers. Introduction to generating functions.	10
6	13/10/2023	Probability generating functions. Generating functions of some common distributions. Poisson distribution as limiting case of a binomial distribution. PGF of the Galton-Watson branching process. Photomultiplier noise. The Luria-Delbrück experiment (slides).	12
7	16/10/2023	The Luria-Delbrück experiment (ctd.). Characteristic functions. Moments of a distribution from the point of view of generating and characteristic functions. Skewness and kurtosis. Mode and median. Properties of characteristic functions. Characteristic functions of some common distributions. Characteristic function of the Gaussian distribution. Series expansion of a generic characteristic function. The Central Limit Theorem (CLT). Further comments on on the CLT. The Berry-Esseen theorem.	14
8	20/10/2023	Multiplicative processes and their difference with additive processes. Power laws. Brief review of common probability models: 1. The uniform distribution; 2. the Bernoulli distribution; 3 the binomial distribution. 4. the geometric distribution. 5. the negative binomial distribution.	16
9	23/10/2023	Brief review of common probability models (ctd.): 6. the hypergeometric distribution; application of the hypergeometric distribution to opinion polling. 6. the multinomial distribution. 8. Poisson distribution. The Poisson process as a memoryless random point process. Examples: cell plating statistics in biology; Poisson survival probability in radiobiology; simple model of onset of cancer vs. age (slides).	18
10	27/10/2023	Brief review of common probability models (ctd): 9. exponential distribution. Example: paralyzable and non-paralyzable detectors. Example: coincidence counting. 10. Properties of the normal distribution. Error function and the cumulative distribution function of the normal distribution. High-rate limit of the Poisson distribution. Transformations of random variables. Generic transformations of multivariate distributions. Approximate transformation of random variables: General formulas for error propagation. Linear orthogonal transformations of random variables.	20
11	30/10/2023	Brief review of common probability models (ctd): The multivariate normal distribution. The log-normal distribution. The gamma distribution. The Cauchy-Lorentz distribution. The Landau distribution. The Rayleigh distribution. Other important probability distributions. Example of a complex model used to setup a null hypothesis: the distribution of nearest-neighbor distance.	22
Part 2: Statistical inference
12	10/11/2023	Introduction to descriptive and exploratory statistics (slides). Visualization in statistics (see this beautiful example due to Hans Rosling). Quick review of sample mean, sample variance, estimate of covariance and correlation coefficient. Linear fits and the correlation coefficient. Schwartz's inequality and Pearson's correlation coefficient. Broad discussion on Exploratory Data Analysis (EDA, link to the dedicated NIST website). PDF of sample mean for exponentially distributed data.	24
13	13/11/2023	Order statistics. Box plots. Outliers. Violin plots. Rug plots. Kernel density plots. Discussion on the inadequacy of standard deviation in the presence of extreme outliers, and proposed solution drawn from information theoretic concepts. Correlated noise and precise timekeeping; the Allan variance.	26
14	17/11/2023	Introduction to the Monte Carlo method. Early history of the Monte Carlo method. Pseudorandom numbers. Uniformly distributed pseudorandom numbers. Transformation method. Example: generation of exponentially-distributed pseudorandom numbers.	28
15	20/11/2023	The Box-Müller method for the generation of pairs of Gaussian variates. Transformation method and the transformation of differential cross sections. Acceptance-rejection method. Monte Carlo method examples: Examples: 1. generation of angles in the e+e- -> mu+mu- scattering; 2. generation of angles in the Bhabha scattering. 3. the structure of a complete MC program to simulate low-energy electron transport.	30
16	22/11/2023	Statistical bootstrap. Review of the Bayesian approach to statistical inference. Maximum likelihood method 1. Connection with Bayes' theorem. The Maximum Likelihood principle. Example with exponentially distributed data.	32
17	24/11/2023	Maximum likelihood method 2. Point estimators. Properties of estimators. Transformations of estimators. Consistency of the maximum likelihood estimators. Asymptotic optimality of ML estimators. Bartlett's Identities. Cramer-Rao-Fisher bound. Variance of ML estimators.	34
18	27/11/2023	Maximum likelihood method 3. Introduction to Shannon's entropy. Information measures based on the Shannon's entropy: Kullback-Leibler divergence, Jeffreys distance. Fisher information and its applications. Efficiency and Gaussianity of ML estimators.	36
19	1/12/2023	Introduction to confidence intervals. Credible intervals in the Bayesian perspective. Confidence intervals for the sample mean of exponentially distributed samples. Confidence intervals and confidence level. Detailed analysis of the Neyman construction of confidence intervals (link to the Neyman paper). Confidence intervals for the correlation coefficient of a bivariate Gaussian distribution from MC simulation. Graphical method for the variance of ML estimators. Extended maximum likelihood. Introduction to ML with binned data. Example with two counting channels.	38
20	4/12/2023	Chi-square and its relation to ML. The chi-square distribution using the formalism of characteristic functions. The chi-square distribution in the frame of multidimensional geometry. The least-squares method. Parametric fits. General least squares. Nonlinear least squares. Chi-square minimization in the context of optimization theory: short list of function minimization methods. Linear regression and linear prediction with autoregressive-moving average (ARMA) models. An anecdote about Fermi and multiparametric fits (check Dyson's paper).	40
21	11/12/2023	More on fits. Hypothesis tests, significance level. Examples. Critical region and acceptance region. Errors of the first and of the second kind. The Neyman-Pearson lemma. Wilks' theorem (link to Wilks' paper).	42
22	13/12/2023	Fisher's p-value and rejection of the null hypothesis. Counting problems in physics: the Li-Ma algorithm in gamma-ray astronomy (and the likelihood-ratio method) (slides).	44
23	15/12/2023	Machine Learning and Statistics(1). Naive Bayesian Classification. The Expectation-Maximization (EM) algorithm. (slides and link to "A gentle tutorial of the EM algorithm" by J. A. Bilmes)	46
24	18/12/2023	Machine Learning and Statistics(2). K-means. The Principal Components method. (slides)	48

Edoardo Milotti - Dec. 2023