4.3 Keeping an Eye on Healthcare Costs: The D2Hawkeye Story | 4.3 Keeping an Eye on Healthcare Costs: The D2Hawkeye Story | 4 Trees | The Analytics Edge | Sloan School of Management

<Video 7: Baseline Method and Penalty Matrix
4.3.1Video 1: The Story of D2Hawkeye
4.3.2Quick Question
4.3.3Video 2: Claims Data
4.3.4Quick Question
4.3.5Video 3: The Variables
4.3.6Quick Question
4.3.7Video 4: Error Measures
4.3.8Quick Question
4.3.9Video 5: CART to Predict Cost
4.3.10Quick Question
4.3.11Video 6: Claims Data in R
4.3.12Quick Question
4.3.13Video 7: Baseline Method and Penalty Matrix
4.3.14Quick Question
4.3.15Video 8: Predicting Healthcare Cost in R
4.3.16Quick Question
4.3.17Video 9: Results
>Video 8: Predicting Healthcare Cost in R

Quick Question

Suppose that instead of the baseline method discussed in the previous video, we used the baseline method of predicting the most frequent outcome for all observations. This new baseline method would predict cost bucket 1 for everyone.

What would the accuracy of this baseline method be on the test set?

What would the penalty error of this baseline method be on the test set?

Explanation

To compute the accuracy, you can create a table of the variable ClaimsTest$bucket2009:

table(ClaimsTest$bucket2009)

According to the table output, this baseline method would get 122978 observations correct, and all other observations wrong. So the accuracy of this baseline method is 122978/nrow(ClaimsTest) = 0.67127.

For the penalty error, since this baseline method predicts 1 for all observations, it would have a penalty error of:

(0*122978 + 2*34840 + 4*16390 + 6*7937 + 8*1057)/nrow(ClaimsTest) = 1.044301