Free SOA Exam SRM (Statistics for Risk Modeling) Basics of Statistical Learning Practice Questions
Build your foundation in statistical learning concepts for SOA Exam SRM. Questions cover the bias-variance tradeoff, model complexity, cross-validation, and the distinction between supervised and unsupervised methods.
Sample Questions
Question 1
Easy
In the bias-variance tradeoff, bias refers to:
Solution
Bias measures the systematic error introduced by the modeling assumptions. It is the difference between the expected prediction (averaged over many training sets) and the true function value . A high-bias model makes strong assumptions that may not match the true relationship (underfitting).
Why each other option is incorrect:
- (E) This describes variance, not bias.
- (C) This describes the irreducible error .
- (D) The total prediction error includes bias, variance, and irreducible error — not just bias.
- (B) Correlation between predictions and true values is related to model accuracy but is not the definition of bias.
Why each other option is incorrect:
- (E) This describes variance, not bias.
- (C) This describes the irreducible error .
- (D) The total prediction error includes bias, variance, and irreducible error — not just bias.
- (B) Correlation between predictions and true values is related to model accuracy but is not the definition of bias.
Question 2
Medium
Which of the following is TRUE about the Bayes classifier?
Solution
The Bayes classifier assigns each observation to the most probable class given the observed features: . This is the theoretically optimal classifier that minimizes the overall misclassification rate.
Why each other option is incorrect:
- (E) The Bayes classifier requires knowledge of the true conditional class probabilities, which are generally unknown. Even with large samples, we can only estimate these probabilities, not compute them exactly.
- (B) The Bayes classifier achieves the lowest possible error rate (Bayes error rate), but this is generally nonzero because of overlapping class distributions.
- (C) The regression function is relevant for regression, not classification. The Bayes classifier uses conditional class probabilities.
- (D) KNN with is an approximation, not equal to the Bayes classifier. As and grows appropriately, KNN can approach the Bayes classifier, but specifically has high variance.
Why each other option is incorrect:
- (E) The Bayes classifier requires knowledge of the true conditional class probabilities, which are generally unknown. Even with large samples, we can only estimate these probabilities, not compute them exactly.
- (B) The Bayes classifier achieves the lowest possible error rate (Bayes error rate), but this is generally nonzero because of overlapping class distributions.
- (C) The regression function is relevant for regression, not classification. The Bayes classifier uses conditional class probabilities.
- (D) KNN with is an approximation, not equal to the Bayes classifier. As and grows appropriately, KNN can approach the Bayes classifier, but specifically has high variance.
Question 3
Hard
An actuary considers three approaches for estimating test error on a dataset of 500 observations:
I. Validation set approach (50/50 split)
II. 10-fold cross-validation
III. LOOCV
Rank these approaches from HIGHEST to LOWEST variance of the test error estimate.
I. Validation set approach (50/50 split)
II. 10-fold cross-validation
III. LOOCV
Rank these approaches from HIGHEST to LOWEST variance of the test error estimate.
Solution
The ranking from highest to lowest variance is: I (validation set) > III (LOOCV) > II (10-fold CV).
- **Validation set approach (I)**: Uses only 50% of data for training, and the estimate depends entirely on one random split. This produces the highest variance because a single split can be very unrepresentative.
- **LOOCV (III)**: Uses observations for training in each fold. While it averages over folds, the training sets are nearly identical (differ by only 1 observation), producing highly correlated fold estimates. Averaging correlated estimates does not reduce variance as effectively, so LOOCV has moderate-to-high variance.
- **10-fold CV (II)**: Uses 90% of data for training and averages over 10 less-correlated fold estimates. The lower correlation between folds means averaging is more effective at reducing variance.
Why each other option is incorrect:
- (A) This places LOOCV as having the lowest variance, which contradicts the fact that its fold estimates are highly correlated.
- (B) This places the validation set approach as having the lowest variance, but it actually has the highest due to complete dependence on one split.
- (C) This places 10-fold CV as having the highest variance, which is incorrect.
- (E) This places 10-fold CV as having higher variance than the validation set approach, which is incorrect.
- **Validation set approach (I)**: Uses only 50% of data for training, and the estimate depends entirely on one random split. This produces the highest variance because a single split can be very unrepresentative.
- **LOOCV (III)**: Uses observations for training in each fold. While it averages over folds, the training sets are nearly identical (differ by only 1 observation), producing highly correlated fold estimates. Averaging correlated estimates does not reduce variance as effectively, so LOOCV has moderate-to-high variance.
- **10-fold CV (II)**: Uses 90% of data for training and averages over 10 less-correlated fold estimates. The lower correlation between folds means averaging is more effective at reducing variance.
Why each other option is incorrect:
- (A) This places LOOCV as having the lowest variance, which contradicts the fact that its fold estimates are highly correlated.
- (B) This places the validation set approach as having the lowest variance, but it actually has the highest due to complete dependence on one split.
- (C) This places 10-fold CV as having the highest variance, which is incorrect.
- (E) This places 10-fold CV as having higher variance than the validation set approach, which is incorrect.
More Exam SRM Topics
About FreeFellow
FreeFellow is a free exam prep platform for actuarial (SOA & CAS), CFA, CFP, CPA, CAIA, and securities licensing candidates. Every question includes a detailed solution. Full lessons, flashcards with spaced repetition, timed mock exams, performance analytics, and a personalized study plan are all included — no paywalls, no ads.