SOA Exam SRM (Statistics for Risk Modeling) Glossary

30 essential terms and definitions for SOA Exam SRM (Statistics for Risk Modeling). Each definition is written for exam preparation, covering the concepts as they are tested on the 2026 syllabus.

30 Terms

15 Sections

2026 Syllabus

A

AIC (Akaike Information Criterion): Akaike Information Criterion is a model selection metric that balances goodness of fit against model complexity by penalizing the number of estimated parameters, with lower values indicating a more parsimonious model. $\text{AIC} = -2\ln(L) + 2k$
ARIMA: ARIMA (AutoRegressive Integrated Moving Average) is a time series forecasting model that combines autoregressive terms, differencing for stationarity, and moving average terms, specified by orders (p, d, q).
Autocorrelation: Autocorrelation is the correlation of a time series with a lagged version of itself, measured by the autocorrelation function (ACF), used to identify temporal patterns and determine the order of time series models.

B

Bagging: Bagging (bootstrap aggregating) is an ensemble method that trains multiple models on random bootstrap samples of the training data and averages their predictions, reducing variance and improving stability compared to a single model.
Bias-Variance Tradeoff: Bias-variance tradeoff is the fundamental tension in predictive modeling: increasing model complexity reduces bias (systematic error) but increases variance (sensitivity to training data), with the optimal model minimizing total expected prediction error.
BIC (Bayesian Information Criterion): Bayesian Information Criterion is a model selection criterion similar to AIC but with a stronger penalty for model complexity, tending to select simpler models, especially as sample size increases. $\text{BIC} = -2\ln(L) + k\ln(n)$

C

Classification Tree: Classification tree is a decision tree used for categorical response variables, recursively splitting the feature space into regions and assigning each region to the most frequent class, with splits chosen to maximize purity (minimize Gini impurity or entropy).
Clustering: Clustering is an unsupervised learning technique that groups observations into clusters such that observations within a cluster are more similar to each other than to those in other clusters, with common methods including k-means and hierarchical clustering.
Confusion Matrix: Confusion matrix is a table that summarizes the performance of a classification model by displaying the counts of true positives, true negatives, false positives, and false negatives for each class.
Cross-Validation: Cross-validation is a resampling technique that partitions data into complementary training and validation sets across multiple iterations (such as k-fold) to estimate out-of-sample prediction error and guard against overfitting.

D

Decision Tree: Decision tree is a nonparametric supervised learning method that recursively partitions the feature space using binary splits based on predictor variables, producing an interpretable tree structure for classification or regression.

E

Elastic Net: Elastic net is a regularization method that combines the L1 penalty of LASSO and the L2 penalty of ridge regression, controlled by a mixing parameter, useful when predictors are correlated and variable selection is desired. $\min \sum (y_i - \hat{y}_i)^2 + \lambda_1 \sum |\beta_j| + \lambda_2 \sum \beta_j^2$

G

Generalized Linear Model: Generalized linear model (GLM) extends ordinary linear regression by allowing the response variable to follow any distribution in the exponential family and linking the mean to the linear predictor through a link function, accommodating count, binary, and continuous positive outcomes.
Gini Impurity: Gini impurity is a measure of node purity in classification trees, calculated as the probability of incorrectly classifying a randomly chosen element if it were labeled according to the distribution of classes in the node. $G = 1 - \sum_{k=1}^{K} p_k^2$

K

K-Means Clustering: K-means clustering is a partitional clustering algorithm that assigns each observation to the cluster with the nearest centroid, iteratively updating centroids and reassigning observations to minimize total within-cluster sum of squares.

L

LASSO Regression: LASSO (Least Absolute Shrinkage and Selection Operator) regression adds an L1 penalty to the ordinary least squares objective, shrinking some coefficients exactly to zero and thus performing variable selection. $\min \sum (y_i - \hat{y}_i)^2 + \lambda \sum |\beta_j|$
Linear Regression: Linear regression models the relationship between a continuous response variable and one or more predictor variables by fitting a linear equation to the observed data, estimating coefficients that minimize the sum of squared residuals. $y = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p + \varepsilon$
Logistic Regression: Logistic regression models the probability of a binary outcome as a function of predictor variables using the logit link function, estimating coefficients via maximum likelihood. $\ln\!\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p$

M

Mean Squared Error: Mean squared error is the average of the squared differences between predicted and observed values, serving as a standard loss function for regression models that penalizes larger errors more heavily. $\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$
Multicollinearity: Multicollinearity occurs when two or more predictor variables in a regression model are highly correlated, inflating the variance of coefficient estimates and making individual predictor effects difficult to interpret. Diagnosed using the variance inflation factor (VIF).

O

Overfitting: Overfitting occurs when a model learns the noise in the training data rather than the underlying pattern, resulting in excellent training performance but poor generalization to new data.

P

Principal Component Analysis: Principal component analysis (PCA) is a dimensionality reduction technique that transforms correlated variables into a smaller set of uncorrelated principal components, ordered by the amount of variance they explain.

R

R-Squared: R-squared (coefficient of determination) measures the proportion of variance in the response variable explained by the model, ranging from 0 to 1, with higher values indicating better fit. $R^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}}$
Random Forest: Random forest is an ensemble method that builds many decision trees on bootstrap samples of the data, using a random subset of features at each split, and averages (regression) or votes (classification) across trees to improve prediction accuracy and reduce overfitting.
Regularization: Regularization is a technique that adds a penalty term to the model's loss function to constrain the size of the coefficients, reducing overfitting by trading a small increase in bias for a larger decrease in variance.
Ridge Regression: Ridge regression adds an L2 penalty (proportional to the sum of squared coefficients) to the ordinary least squares objective, shrinking coefficients toward zero without eliminating them, effective when predictors are multicollinear. $\min \sum (y_i - \hat{y}_i)^2 + \lambda \sum \beta_j^2$
ROC Curve: ROC (Receiver Operating Characteristic) curve is a plot of the true positive rate against the false positive rate at various classification thresholds, with the area under the curve (AUC) summarizing the model's ability to discriminate between classes.

S

Stationarity: Stationarity is a property of a time series whose statistical properties (mean, variance, autocorrelation) do not change over time. Many time series models require stationarity, achieved through differencing or transformation.

T

Time Series: Time series is a sequence of data points collected at successive, equally spaced points in time, analyzed to identify trends, seasonal patterns, and autocorrelation structure for forecasting.

V

Variance Inflation Factor: Variance inflation factor (VIF) quantifies the degree of multicollinearity for each predictor in a regression model, calculated as the reciprocal of one minus the R-squared from regressing that predictor on all other predictors. $\text{VIF}_j = \frac{1}{1 - R_j^2}$

SOA Exam SRM (Statistics for Risk Modeling) Glossary

A

B

C

D

E

G

K

L

M

O

P

R

S

T

V

About FreeFellow