Free SOA Exam SRM (Statistics for Risk Modeling) Formula Sheet (2026)

Every Exam SRM formula you need on the test, grouped by topic, rendered with full math notation. 94 formulas across 5 topics, calibrated to the 2026 syllabus. Free forever, no signup required.

94 Formulas
5 Topics
2026 Syllabus
Free Forever

All Exam SRM Formulas

Basics of Statistical Learning 10 items
Bias-variance tradeoff
E[(yf^(x))2]=Bias2(f^)+Var(f^)+σε2E[(y-\hat{f}(x))^2]=\text{Bias}^2(\hat{f})+\text{Var}(\hat{f})+\sigma^2_\varepsilon
Irreducible error σε2\sigma^2_\varepsilon cannot be reduced
k-fold cross-validation error
CV(k)=1kj=1kMSEjCV_{(k)}=\dfrac{1}{k}\sum_{j=1}^k \text{MSE}_j
Each fold serves as validation once
Expected test MSE decomposition
E[(y0f^(x0))2]=Var(ε)+[Bias(f^(x0))]2+Var(f^(x0))E[(y_0 - \hat{f}(x_0))^2] = \text{Var}(\varepsilon) + [\text{Bias}(\hat{f}(x_0))]^2 + \text{Var}(\hat{f}(x_0)) — Var(ε) = irreducible noise, Bias² = squared model bias, Var(f̂) = model variance
Mallows Cp statistic
Cp=1n(RSS+2dσ^2)C_p = \tfrac{1}{n}(\text{RSS} + 2d\hat{\sigma}^2) — n = sample size, RSS = residual sum of squares, d = number of predictors, σ^2\hat{\sigma}^2 = residual variance estimate
Mean squared error (regression)
MSE=1ni=1n(yif^(xi))2\text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{f}(x_i))^2 — n = sample size, y_i = actual, f^(xi)\hat{f}(x_i) = predicted response
Misclassification rate
Err=1ni=1nI(yiy^i)\text{Err} = \frac{1}{n}\sum_{i=1}^{n} I(y_i \ne \hat{y}_i) — n = sample size, I = indicator function, y_i = actual class, y^i\hat{y}_i = predicted class
LOOCV closed-form estimate for least-squares linear regression
CV(n)=1ni=1n(yiy^i1hi)2\text{CV}_{(n)} = \frac{1}{n} \sum_{i=1}^{n} \left( \frac{y_i - \hat{y}_i}{1 - h_i} \right)^{2} — n = sample size, y_i = observed, ŷ_i = fitted, h_i = leverage of obs i
Logistic regression logit model
log[p/(1p)]=Xβ\log[p/(1-p)] = X\beta — p = P(Y=1|X), X = predictor matrix, β = coefficient vector, log-odds linear in X
Linear regression model
Y=Xβ+εY = X\beta + \varepsilon — Y = numeric response, X = predictor matrix, β = coefficient vector, ε = error term with mean 0
Expected prediction error (squared loss)
E[(Yf^(X))2]E[(Y - \hat{f}(X))^2] — Y = true response, f^(X)\hat{f}(X) = predicted response, E = expectation over (X, Y)
Linear Models 37 items
R-squared
R2=1SSresSStot=1(yiy^i)2(yiyˉ)2R^2 = 1 - \dfrac{SS_{\text{res}}}{SS_{\text{tot}}} = 1 - \dfrac{\sum(y_i-\hat{y}_i)^2}{\sum(y_i-\bar{y})^2}
Adjusted R-squared
Rˉ2=1(1R2)(n1)np1\bar{R}^2 = 1 - \dfrac{(1-R^2)(n-1)}{n-p-1}
pp=number of predictors (excludes intercept)
F-statistic (regression)
F=(SStotSSres)/pSSres/(np1)F=\dfrac{(SS_{\text{tot}}-SS_{\text{res}})/p}{SS_{\text{res}}/(n-p-1)}
Tests H0:H_0: all slope coefficients are zero
Variance inflation factor (VIF)
VIFj=11Rj2\text{VIF}_j = \dfrac{1}{1-R_j^2}
Rj2R_j^2=R2R^2 from regressing XjX_j on all other predictors
VIF>5–10 indicates multicollinearity
AIC
AIC=2p2lnL^AIC = 2p - 2\ln\hat{L}
pp=number of parameters; lower is better
BIC
BIC=plnn2lnL^BIC = p\ln n - 2\ln\hat{L}
Penalizes complexity more than AIC for n>7n>7
Ridge regression penalty
Minimize: (yiy^i)2+λj=1pβj2\sum(y_i-\hat{y}_i)^2+\lambda\sum_{j=1}^p\beta_j^2
L2L_2 penalty; shrinks but does not zero out coefficients
LASSO penalty
Minimize: (yiy^i)2+λj=1pβj\sum(y_i-\hat{y}_i)^2+\lambda\sum_{j=1}^p|\beta_j|
L1L_1 penalty; produces sparse solutions (exact zeros)
Elastic net penalty
Minimize: (yiy^i)2+λ1βj+λ2βj2\sum(y_i-\hat{y}_i)^2+\lambda_1\sum|\beta_j|+\lambda_2\sum\beta_j^2
Combines LASSO (L1L_1) and Ridge (L2L_2)
Studentized residual
ri=eiσ^1hiir_i = \frac{e_i}{\hat{\sigma}\sqrt{1 - h_{ii}}} — e = raw residual, σ̂ = residual SE, h = leverage; |r| > 2 flags outlier
Durbin-Watson statistic
DW=i=2n(eiei1)2i=1nei22(1ρ^)DW = \frac{\sum_{i=2}^{n}(e_i - e_{i-1})^2}{\sum_{i=1}^{n} e_i^2} \approx 2(1 - \hat{\rho}) — e = residual, n = sample size, ρ̂ = lag-1 residual autocorrelation
Cook's distance
Di=ri2p+1hii1hiiD_i = \frac{r_i^2}{p+1} \cdot \frac{h_{ii}}{1 - h_{ii}} — r = studentized residual, h = leverage, p = number of predictors; D > 1 is influential
High-leverage cutoff
hii>2(p+1)nh_{ii} > \frac{2(p+1)}{n} — h = leverage (hat matrix diagonal), p = number of predictors, n = sample size
Tweedie variance function
V(μ)=ϕμpV(\mu) = \phi\, \mu^p — φ = dispersion, μ = mean, p = power parameter (p=0 normal, p=1 Poisson, 1<p<2 compound Poisson-gamma, p=2 gamma)
GLM link function and linear predictor
g(μ)=η=Xβg(\mu) = \eta = X\beta — g = link function, μ = mean response, η = linear predictor, X = design matrix, β = coefficient vector
GLM mean and variance from b(theta)
E[Y]=b(θ)=μ, Var(Y)=b(θ)a(ϕ)E[Y] = b'(\theta) = \mu,\ \mathrm{Var}(Y) = b''(\theta)\, a(\phi) — θ = natural parameter, φ = dispersion, μ = mean, b'' a(φ) = V(μ)·a(φ)
Exponential family density form
f(y;θ,ϕ)=exp{[yθb(θ)]/a(ϕ)+c(y,ϕ)}f(y;\theta,\phi) = \exp\{[y\theta - b(\theta)]/a(\phi) + c(y,\phi)\} — y = data, θ = natural parameter, φ = dispersion, a/b/c = family-specific functions
Exponential family density (GLM random component)
f(y;θ,ϕ)=exp{[yθb(θ)]/a(ϕ)+c(y,ϕ)}f(y; \theta, \phi) = \exp\{[y\theta - b(\theta)]/a(\phi) + c(y, \phi)\} — θ = canonical parameter, φ = dispersion, b(θ) sets mean and variance
OLS coefficient estimator (closed form)
β^=(XTX)1XTY\hat{\beta} = (X^{T}X)^{-1} X^{T} Y — X = design matrix of predictors, Y = response vector, β^\hat{\beta} = least-squares coefficient vector
GLM variance structure
Var(Y)=ϕV(μ)\text{Var}(Y) = \phi\, V(\mu) — φ = dispersion parameter, V(μ) = variance function (μ for Poisson, μ(1−μ) for binomial, μ² for gamma)
GLM deviance (goodness-of-fit)
D=2[(saturated)(μ^)]D = 2[\ell(\text{saturated}) - \ell(\hat{\mu})] — ℓ = log-likelihood, saturated = perfect-fit model, μ^\hat{\mu} = fitted means; analog of RSS
Box-Cox transformation
Y(λ)=(Yλ1)/λY^{(\lambda)} = (Y^\lambda - 1)/\lambda — Y = positive response, λ = power parameter; λ=0 gives log transform, λ=1 gives no transform
Marginal effect of a predictor with an interaction term
η/X1=β1+β12X2\partial \eta / \partial X_1 = \beta_1 + \beta_{12} X_2 — β_1 = main effect of X_1, β_12 = interaction coefficient, X_2 = moderator value
t-statistic for a regression coefficient
t=β^j/SE(β^j)t = \hat{\beta}_j / SE(\hat{\beta}_j) — β̂_j = estimated coefficient, SE = standard error, compared to t-distribution with n-p-1 degrees of freedom
Likelihood ratio test statistic
Λ=2(logLFlogLR)χpq2\Lambda = 2(\log L_F - \log L_R) \sim \chi^2_{p-q} — L_F = full model log-likelihood, L_R = reduced model log-likelihood, p-q = parameters dropped
K-nearest neighbors classification probability
Pr(Y=jX=x0)=1KxiNK(x0)I(yi=j)\Pr(Y = j \mid X = x_0) = \frac{1}{K}\sum_{x_i \in N_K(x_0)} I(y_i = j) — j = class, K = neighbors, I(·) = indicator function, N_K(x₀) = K nearest training points
Ridge regression effective degrees of freedom
df(λ)=tr[X(XTX+λI)1XT]df(\lambda) = tr[X(X^TX + \lambda I)^{-1}X^T] — X = design matrix, λ = ridge tuning parameter, I = identity matrix, tr = matrix trace
K-nearest neighbors regression prediction
f^(x0)=1KxiNK(x0)yi\hat{f}(x_0) = \frac{1}{K}\sum_{x_i \in N_K(x_0)} y_i — x₀ = query point, K = neighborhood size, N_K(x₀) = K nearest training points, y_i = neighbor responses
Ridge regression closed-form estimator
β^ridge=(XTX+λI)1XTy\hat{\beta}^{ridge} = (X^TX + \lambda I)^{-1} X^T y — X = design matrix, y = response vector, λ = ridge tuning parameter, I = identity matrix
Small log-coefficient percent-change approximation
eβ^j1β^je^{\hat{\beta}_j} - 1 \approx \hat{\beta}_j for β^j<0.2|\hat{\beta}_j| < 0.2 — β_j = coefficient on log(Y); approximate percent change in Y per unit increase in X_j is 100β_j%
Effective slope with interaction term
slopeX1=β^1+β^3X2\text{slope}_{X_1} = \hat{\beta}_1 + \hat{\beta}_3 X_2 — β_1 = main effect on X_1, β_3 = coefficient on X_1 X_2 interaction, X_2 = value of interacting variable
Log-log elasticity interpretation
%ΔYβ^j%ΔXj\%\Delta Y \approx \hat{\beta}_j \cdot \%\Delta X_j — β_j = coefficient when both X_j and Y are log-transformed; β_j is the elasticity of Y with respect to X_j
Log-response multiplicative effect of a predictor
E[YXj+1]=E[YXj]eβ^jE[Y \mid X_j + 1] = E[Y \mid X_j] \cdot e^{\hat{\beta}_j} — β_j = coefficient in log(Y) model, X_j = predictor, e^{β_j} = multiplicative factor on Y
Confidence interval for the mean response in simple linear regression
y^0±tn2,α/2s1n+(x0xˉ)2Sxx\hat{y}_0 \pm t_{n-2,\alpha/2} \cdot s\sqrt{\frac{1}{n} + \frac{(x_0-\bar{x})^2}{S_{xx}}} — s = residual SE, n = sample size, S_xx = Σ(x_i-x̄)²
Prediction standard error from confidence standard error
SEpred=SE(y^0)2+s2SE_{pred} = \sqrt{SE(\hat{y}_0)^2 + s^2} — SE(ŷ₀) = CI standard error of the fit, s = residual standard error
Prediction interval for a new observation in simple linear regression
y^0±tn2,α/2s1+1n+(x0xˉ)2Sxx\hat{y}_0 \pm t_{n-2,\alpha/2} \cdot s\sqrt{1 + \frac{1}{n} + \frac{(x_0-\bar{x})^2}{S_{xx}}} — extra +1 inside radical captures irreducible noise σ²
Variance of the fitted mean in multiple regression
Var(y^0)=σ2x0(XX)1x0\text{Var}(\hat{y}_0) = \sigma^2 \mathbf{x}_0^\top (X^\top X)^{-1} \mathbf{x}_0 — x_0 = predictor vector at new point, X = design matrix, σ² = error variance
Time Series Models 17 items
AR(1) model
Xt=ϕXt1+εt,εtWN(0,σ2)X_t=\phi X_{t-1}+\varepsilon_t,\quad\varepsilon_t\sim WN(0,\sigma^2)
Stationary iff ϕ<1|\phi|<1
MA(1) model
Xt=εt+θεt1,εtWN(0,σ2)X_t=\varepsilon_t+\theta\varepsilon_{t-1},\quad\varepsilon_t\sim WN(0,\sigma^2)
Always stationary
ARMA(1,1) model
Xt=ϕXt1+εt+θεt1X_t=\phi X_{t-1}+\varepsilon_t+\theta\varepsilon_{t-1}
Stationary iff ϕ<1|\phi|<1
ACF of AR(1)
ρ(h)=ϕh,h=0,1,2,\rho(h)=\phi^h,\quad h=0,1,2,\ldots
Decays geometrically; PACF cuts off after lag 1
Ljung-Box test statistic
Q=n(n+2)k=1mρ^k2nkQ=n(n+2)\sum_{k=1}^m\dfrac{\hat{\rho}_k^2}{n-k}
Tests H0:H_0: first mm autocorrelations are zero
Distributed χ2(m)\chi^2(m) under H0H_0
Random walk with drift h-step point forecast
Y^T+h=YT+hδ\hat{Y}_{T+h} = Y_T + h\deltaYTY_T = last observed value, δ\delta = drift per period, h = forecast horizon
h-step prediction interval for a time series forecast
Y^T+h±zα/2Var(eT+h)\hat{Y}_{T+h} \pm z_{\alpha/2} \sqrt{\text{Var}(e_{T+h})}Y^T+h\hat{Y}_{T+h} = point forecast, zα/2z_{\alpha/2} = normal quantile (1.96 for 95%), Var(eT+h)\text{Var}(e_{T+h}) = h-step forecast error variance
AR(1) h-step forecast error variance
Var(eT+h)=σ21ϕ2h1ϕ2\text{Var}(e_{T+h}) = \sigma^2 \cdot \frac{1 - \phi^{2h}}{1 - \phi^2}σ2\sigma^2 = white noise variance, ϕ\phi = AR(1) coefficient (ϕ<1|\phi|<1), h = forecast horizon
AR(1) long-run (unconditional) forecast variance ceiling
Var=σ21ϕ2\text{Var}_\infty = \frac{\sigma^2}{1 - \phi^2}σ2\sigma^2 = white noise variance, ϕ\phi = AR(1) coefficient with ϕ<1|\phi|<1
Simple exponential smoothing one-step forecast
Y^t+1=αYt+(1α)Y^t\hat{Y}_{t+1} = \alpha Y_t + (1-\alpha)\hat{Y}_t — α ∈ (0,1) = smoothing constant, Y_t = current observation, Ŷ_t = previous smoothed forecast
ARCH(q) conditional variance
σt2=ω+α1εt12++αqεtq2\sigma_t^2 = \omega + \alpha_1 \varepsilon_{t-1}^2 + \cdots + \alpha_q \varepsilon_{t-q}^2 — ω > 0 = constant, α_i ≥ 0 = ARCH weights, ε_{t-i} = past residuals
Long-run mean of a stationary AR(1) process
μ=c1ϕ1\mu = \dfrac{c}{1 - \phi_1} — c = intercept, φ₁ = AR(1) coefficient with |φ₁| < 1 for stationarity
GARCH(1,1) conditional variance
σt2=ω+α1εt12+β1σt12\sigma_t^2 = \omega + \alpha_1 \varepsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2 — ω = constant, α₁ = ARCH term, β₁ = GARCH lagged-variance term, ε_{t-1} = prior residual
Random walk variance
Var(Yt)=tσ2\text{Var}(Y_t) = t\sigma^2 where Yt=Y0+i=1tεiY_t = Y_0 + \sum_{i=1}^{t}\varepsilon_i — Y_0 = starting value, σ² = white noise variance, t = time index; variance grows linearly so series is non-stationary
White noise 95 percent significance band for sample ACF
ρ^k±1.96T\hat{\rho}_k \in \pm \dfrac{1.96}{\sqrt{T}} — T = sample size; sample autocorrelations inside this band are consistent with zero at the 5% level
Autocorrelation function at lag k
ρk=γkγ0=Cov(Yt,Ytk)Var(Yt)\rho_k = \dfrac{\gamma_k}{\gamma_0} = \dfrac{\text{Cov}(Y_t, Y_{t-k})}{\text{Var}(Y_t)} — γ_k = autocovariance at lag k, γ_0 = variance, ρ_0 = 1, |ρ_k| ≤ 1
Sample autocorrelation at lag k
ρ^k=t=k+1T(YtYˉ)(YtkYˉ)t=1T(YtYˉ)2\hat{\rho}_k = \dfrac{\sum_{t=k+1}^{T}(Y_t - \bar{Y})(Y_{t-k} - \bar{Y})}{\sum_{t=1}^{T}(Y_t - \bar{Y})^2} — Ȳ = sample mean, T = sample size, k = lag
Decision Trees 14 items
Gini impurity
G=k=1Kp^k(1p^k)=1k=1Kp^k2G = \sum_{k=1}^{K} \hat{p}_k(1-\hat{p}_k) = 1 - \sum_{k=1}^{K}\hat{p}_k^2
p^k\hat{p}_k=fraction of class kk in node
Entropy (node impurity)
H=k=1Kp^klogp^kH = -\sum_{k=1}^{K}\hat{p}_k\log\hat{p}_k
Bagging (bootstrap aggregation)
Train BB trees on bootstrap samples; aggregate predictions
f^(x)=1Bb=1Bf^b(x)\hat{f}(x)=\dfrac{1}{B}\sum_{b=1}^B\hat{f}_b(x)
Reduces variance without increasing bias
Out-of-bag mean squared error
OOB MSE=1ni=1n(yiy^iOOB)2\text{OOB MSE} = \frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i^{\text{OOB}})^2yiy_i = actual response, y^iOOB\hat{y}_i^{\text{OOB}} = average of trees not trained on i
Out-of-bag observation fraction
(11/n)ne10.368(1 - 1/n)^n \to e^{-1} \approx 0.368 — n = training rows, fraction of rows omitted from any given bootstrap sample
Boosting ensemble update rule
f^(x)f^(x)+λf^b(x)\hat{f}(x) \leftarrow \hat{f}(x) + \lambda \hat{f}^{b}(x)f^\hat{f} = current ensemble, f^b\hat{f}^b = new tree fit to residuals, λ\lambda = shrinkage/learning rate
Random forest default predictors per split
m=pm = \lfloor \sqrt{p} \rfloor (classification); m=p/3m = \lfloor p/3 \rfloor (regression) — m = predictors sampled at each split, p = total predictors
Piecewise-constant decision tree prediction function
Y^=m=1Mcm1{XRm}\hat{Y} = \sum_{m=1}^{M} c_m \cdot \mathbf{1}\{X \in R_m\} — M = number of leaves, R_m = m-th rectangular region, c_m = constant prediction in leaf m, 1{·} = indicator function
Number of unique binary splits for an unordered categorical predictor
N=2q11N = 2^{q-1} - 1 — q = number of unordered levels of the categorical predictor
Cost-complexity criterion for a classification tree
Cα(T)=m=1TiRmL(yi,y^Rm)+αTC_\alpha(T) = \sum_{m=1}^{|T|} \sum_{i \in R_m} L(y_i, \hat{y}_{R_m}) + \alpha |T| — L = Gini or entropy impurity, |T| = number of terminal nodes, α ≥ 0 = complexity penalty
Recursive binary splitting objective for a regression tree
minj,s[iR1(j,s)(yiyˉR1)2+iR2(j,s)(yiyˉR2)2]\min_{j,s} \big[\sum_{i \in R_1(j,s)} (y_i - \bar{y}_{R_1})^2 + \sum_{i \in R_2(j,s)} (y_i - \bar{y}_{R_2})^2\big] — j = predictor, s = cutpoint, R_1,R_2 = half-planes
Cost-complexity pruning criterion for a regression tree
Cα(T)=m=1Ti:xiRm(yiy^Rm)2+αTC_\alpha(T) = \sum_{m=1}^{|T|} \sum_{i: x_i \in R_m} (y_i - \hat{y}_{R_m})^2 + \alpha |T| — |T| = number of leaves, α = complexity penalty, R_m = leaf region
Classification tree leaf prediction
C^Rj=argmaxkp^Rj,k\hat{C}_{R_j} = \arg\max_k \hat{p}_{R_j, k} — R_j = leaf region, k = class index, p̂_{R_j,k} = training proportion of class k in leaf
Regression tree leaf prediction
y^Rj=1Rji:xiRjyi\hat{y}_{R_j} = \frac{1}{|R_j|} \sum_{i: x_i \in R_j} y_i — R_j = leaf region, |R_j| = number of training observations in leaf, y_i = response
Unsupervised Learning Techniques 16 items
PCA — proportion of variance explained
PVEk=λkj=1pλj\text{PVE}_k=\dfrac{\lambda_k}{\sum_{j=1}^p\lambda_j}
λk\lambda_k=kkth eigenvalue of covariance (or correlation) matrix
K-means objective function
minC1,,CKk=1KiCkxixˉk2\min_{C_1,\ldots,C_K}\sum_{k=1}^K\sum_{i\in C_k}\|x_i-\bar{x}_k\|^2
xˉk\bar{x}_k=centroid of cluster kk
Average linkage distance between clusters
d(A,B)=1ABaAbBabd(A,B) = \frac{1}{|A||B|} \sum_{a \in A} \sum_{b \in B} \|a - b\| — A, B = clusters, |A|, |B| = cluster sizes, ||·|| = Euclidean distance
Complete linkage distance between clusters
d(A,B)=maxaA,bBabd(A,B) = \max_{a \in A,\, b \in B} \|a - b\| — A, B = clusters, a, b = observations in each cluster, ||·|| = Euclidean distance
Single linkage distance between clusters
d(A,B)=minaA,bBabd(A,B) = \min_{a \in A,\, b \in B} \|a - b\| — A, B = clusters, a, b = observations in each cluster, ||·|| = Euclidean distance
K-means centroid update
μk=1CkiCkxi\mu_k = \frac{1}{|C_k|} \sum_{i \in C_k} x_i — μ_k = centroid of cluster k, C_k = set of points in cluster k, |C_k| = cluster size, x_i = observation
First principal component as a linear combination
Z1=ϕ11X1+ϕ21X2++ϕp1Xp, jϕj12=1Z_1 = \phi_{11} X_1 + \phi_{21} X_2 + \dots + \phi_{p1} X_p,\ \sum_j \phi_{j1}^2 = 1 — Z₁ = PC1 score, φⱼ₁ = loading of Xⱼ on PC1, Xⱼ = centered predictor
Cumulative proportion of variance explained through component M
Cum PVEM=m=1Mλm/j=1pλj\text{Cum PVE}_M = \sum_{m=1}^{M}\lambda_m \big/ \sum_{j=1}^{p}\lambda_j — λₘ = m-th eigenvalue, M = components retained, p = total predictors
PCA loading vector as eigenvector of the covariance matrix
Sϕm=λmϕm\mathbf{S}\,\phi_m = \lambda_m\,\phi_m — S = p×p sample covariance matrix, φₘ = unit eigenvector (loading vector for PC m), λₘ = eigenvalue = Var(Zₘ)
Principal component score for observation i on component m
zim=ϕ1mxi1+ϕ2mxi2++ϕpmxipz_{im} = \phi_{1m} x_{i1} + \phi_{2m} x_{i2} + \dots + \phi_{pm} x_{ip} — zᵢₘ = score, φⱼₘ = loading of Xⱼ on PC m, xᵢⱼ = centered value of Xⱼ for obs i
Total within-cluster sum of squares (pairwise form)
W(k)=c=1k12Cci,jCcxixj2W(k) = \sum_{c=1}^{k} \frac{1}{2|C_c|} \sum_{i,j \in C_c} \|x_i - x_j\|^2 — k = number of clusters, C_c = cluster c, |C_c| = its size, x_i = observation vectors
Silhouette coefficient for an observation
s(i)=b(i)a(i)max{a(i),b(i)}s(i) = \frac{b(i) - a(i)}{\max\{a(i), b(i)\}} — a(i) = mean within-cluster distance, b(i) = mean distance to nearest other cluster; s(i) in [-1, 1]
One-standard-error rule for the gap statistic
Gap(k)Gap(k+1)sk+1\text{Gap}(k) \geq \text{Gap}(k+1) - s_{k+1} — choose smallest k satisfying this; s_{k+1} = standard error of Gap(k+1) across B reference samples
Gap statistic for choosing number of clusters
Gap(k)=E[logW(k)]logW(k)\text{Gap}(k) = E^{*}[\log W^{*}(k)] - \log W(k) — W(k) = observed within-cluster SS, W*(k) = SS on uniform reference data, E* = expectation over B reference samples
Principal component score
Zk=ϕ1kX1+ϕ2kX2++ϕpkXpZ_k = \phi_{1k} X_1 + \phi_{2k} X_2 + \cdots + \phi_{pk} X_p — Z_k = kth component, φ_jk = loading of variable j on PC k, X_j = standardized predictor
Loading vector unit-length constraint
j=1pϕjk2=1\sum_{j=1}^{p} \phi_{jk}^2 = 1 — φ_jk = loading of variable j on PC k, p = number of predictors

Frequently Asked Questions

Is the Exam SRM formula sheet free?
Yes. The full Exam SRM formula sheet is free, with no signup, no email, and no credit card required. 94 formulas across 5 topics, all rendered with the same KaTeX math notation used in the FreeFellow study app.
Will there be a printable PDF version?
A printable PDF is rolling out shortly. In the meantime, the inline page below is print-friendly: most browsers print clean copies via the Print menu (the navigation, footer, and download CTA are hidden in print).
What's covered on the Exam SRM formula sheet?
Every formula is grouped by official syllabus topic, with the formula in math notation plus a one-line note on when to use it (or a watch-out from CAIA, CFA, or other prep-provider commentary). Coverage is calibrated to the 2026 syllabus and refreshed when the corpus changes.
What is FreeFellow's relationship with SOA?
No. FreeFellow is not affiliated with the SOA or any examination body. This is an independent study aid covering the published syllabus.
What else is free at FreeFellow for Exam SRM candidates?
The full question bank with detailed solutions, mixed practice, readiness tracking, lessons (where available), and the formula sheet are all free forever. Fellow ($59/quarter or $149/year per track) unlocks timed mock exams, spaced-repetition flashcards, performance analytics, AI essay grading, and a personalized study plan.
Practice Exam SRM questions free →

About FreeFellow

FreeFellow is a free exam prep library for actuarial (SOA & CAS), CFA, CFP, CPA, CAIA, GARP FRM, IRS Enrolled Agent, IMA CMA, and FINRA / NASAA securities licensing candidates. The entire question bank, written solutions, and lessons are free for every candidate, with no trial period and no credit card. Lessons include narrated audio, and every constructed-response item has a copy-to-AI prompt builder so candidates can paste their answer into their own ChatGPT or Claude for self-graded feedback; Fellow members get instant AI grading on essays against the official rubric (currently CFA Level III, expanding to other essay-bearing sections).

The 70% you need to pass (question bank, written solutions, lessons, formula sheet, mixed practice, readiness tracking) is free forever, with no trial period and no credit card. Become a Fellow ($59/quarter or $149/year per track) to unlock mock exams, flashcards with spaced repetition, performance analytics, AI essay grading, and a personalized study plan.