CAS MAS-II (Modern Actuarial Statistics II) Glossary

26 essential terms and definitions for CAS MAS-II (Modern Actuarial Statistics II). Each definition is written for exam preparation, covering the concepts as they are tested on the 2026 syllabus.

26 Terms

15 Sections

2026 Syllabus

A

AUROC: AUROC (area under the ROC curve) is the probability that a randomly chosen positive observation receives a higher model score than a randomly chosen negative. Range is [0.5, 1]; 0.5 is random, 1 is perfect discrimination, 0.7-0.8 is the typical insurance band.
ARIMA Model: An ARIMA(p, d, q) model combines p autoregressive terms, d unit-root differences, and q moving-average terms to fit a non-stationary time series. After differencing, the series is modeled as a stationary ARMA(p, q) process.
ACF and PACF: The autocorrelation function (ACF) measures correlation between observations at lag k; the partial autocorrelation function (PACF) measures the direct correlation after removing the influence of intermediate lags. AR(p) shows PACF cutoff at lag p; MA(q) shows ACF cutoff at lag q.

B

Buhlmann Credibility: Buhlmann credibility weights the observed sample mean against the manual rate using the ratio of within-risk to between-risk variability. The credibility factor Z rises with experience years and with greater between-risk heterogeneity. $Z = n / (n + k)$ , $k = \text{EPV} / \text{VHM}$
Buhlmann-Straub Credibility: Buhlmann-Straub extends Buhlmann credibility to risks with varying exposure across observation periods. Exposure-weighted means replace simple averages and the credibility constant K scales the total exposure rather than year count.
Bayesian Credibility: Bayesian credibility derives the posterior mean of the unknown risk parameter given the observed experience and a chosen prior. With conjugate-prior families (gamma-Poisson, beta-binomial, normal-normal) the posterior mean takes a credibility-weighted form.
BLUP (Best Linear Unbiased Predictor): The BLUP is the conditional expectation of a random effect given the data under the LMM. It shrinks the group-specific OLS estimate toward zero proportionally to the noise-to-signal ratio (residual variance vs. random-effect variance).

C

Classical (Limited Fluctuation) Credibility: The classical credibility framework certifies an estimate as fully credible when the observed claim count is within a given relative tolerance of the expected count with a chosen probability. Below the full-credibility standard, partial credibility scales by the square root of the ratio of observed to required exposure. $n_F = (y_p / k)^2$

D

Decision Tree Pruning: Cost-complexity pruning trims a fully grown decision tree by minimizing training error plus a complexity penalty alpha times the number of terminal nodes. Alpha is selected by cross-validation; alpha = 0 recovers the full unpruned tree. $R_\alpha(T) = R(T) + \alpha |T|$
Double Lift Chart: A double lift chart compares two competing models by plotting the actual loss ratio against the ratio of their predicted pure premiums, binned into equal-population segments. A model that segments better shows steeper slope; a flat line means the models rank identically.

E

Expected Process Variance (EPV): EPV is the expected value of the within-risk variance across the population of risks. EPV captures the inherent noise in any single risk's experience and increases the Buhlmann credibility constant K, lowering Z.

G

Gini Coefficient: The Gini coefficient summarizes a model's ranking power as twice the area between the Lorenz (gains) curve and the equality line. For binary classifiers, Gini equals 2 * AUROC - 1; range is [0, 1] with 0 being random and 1 perfect. $\text{Gini} = 2 \cdot \text{AUROC} - 1$

H

Hierarchical Clustering: Hierarchical clustering builds a dendrogram bottom-up (agglomerative) by repeatedly merging the closest pair of clusters. Linkage choice (single, complete, average, Ward) controls how between-cluster distance is measured and produces different dendrogram shapes.

I

Intraclass Correlation (ICC): ICC measures the share of residual variance that lies between groups rather than within. ICC near zero implies OLS suffices; ICC near one means LMM is essential because observations within a group are highly correlated. $\rho = \sigma_u^{2} / (\sigma_u^{2} + \sigma_\varepsilon^{2})$

K

K-Nearest Neighbors (KNN): KNN is a non-parametric prediction method: for a new observation, find the K training points closest in feature space, then predict the mean response (regression) or majority class (classification). Choice of K trades bias against variance. $\hat{y} = (1/K) \sum_{i \in N_K(x)} y_i$
Kaiser's Rule: Kaiser's rule retains principal components whose eigenvalue exceeds 1, calibrated to a correlation matrix (where each standardized variable contributes variance 1). The rule does not apply to covariance-matrix PCA because eigenvalues scale with variable units.
K-Means Clustering: K-means partitions n observations into K clusters by iteratively assigning each point to its nearest centroid and re-computing centroids as cluster means. The algorithm converges to a local minimum of within-cluster sum of squares; results depend on initialization.

L

Linear Mixed Model (LMM): An LMM extends ordinary linear regression with both fixed effects and random effects. Random effects allow regression coefficients to vary across grouping levels (e.g., territories, doctors, accounts), inducing correlation among observations within the same group.
Lift: Lift at a population depth d is the response rate in the top d of model-scored records divided by the overall response rate. Lift > 1 means the model concentrates positives in the top scores; lift = 1 is random. $\text{Lift}(d) = (\text{response rate in top } d) / (\text{overall response rate})$

P

Principal Components Analysis (PCA): PCA finds orthogonal linear combinations of standardized predictors that capture maximum variance in the data. The first principal component is the direction of largest variance; loadings give the per-variable weights, and scores project each observation onto the components.

R

Random Effect: A random effect is a regression coefficient drawn from a normal distribution with mean zero and unknown variance, capturing unmeasured grouping-level variation. Fixed effects are estimated once across the data; random effects produce a distribution of intercepts or slopes across groups.
REML (Restricted Maximum Likelihood): REML estimates variance components in an LMM by maximizing the likelihood of the residuals after profiling out the fixed effects. REML produces less biased variance estimates than full ML, especially with small samples.
Random Forest: A random forest averages predictions from B decision trees fit on bootstrap samples, each split restricted to a random subset of m < p predictors. The split restriction decorrelates the trees and shrinks the variance floor of the averaged prediction.

S

Stationarity: A time series is (weakly) stationary if its mean, variance, and autocovariances do not depend on time. AR(p) stationarity requires all roots of the AR polynomial to lie outside the unit circle; otherwise the series has unit roots and must be differenced.

V

Variance of Hypothetical Means (VHM): VHM is the between-risk variance of the unknown risk-mean parameter across the population. Larger VHM means risks are more heterogeneous, decreasing K and raising the Buhlmann credibility factor Z.

W

White Noise: White noise is a sequence of uncorrelated random variables with constant mean (typically zero) and constant variance. Residuals from a correctly specified time-series model should resemble white noise; their ACF should fall inside the significance band at all lags.

CAS MAS-II (Modern Actuarial Statistics II) Glossary

A

B

C

D

E

G

H

I

K

L

P

R

S

V

W

About FreeFellow