In this project, I build a frequency-severity model for automobile insurance claims using the French Motor Third-Party Liability dataset (freMTPL2freq and freMTPL2sev). The goal is to predict expected pure premiums for policyholders by separately modeling claim frequency and severity, then combining them into a pricing model.
I begin by loading the required R packages for data manipulation, visualization, and modeling.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom)
library(ggplot2)
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
library(dplyr)
library(patchwork)
## Warning: package 'patchwork' was built under R version 4.5.1
##
## Attaching package: 'patchwork'
##
## The following object is masked from 'package:MASS':
##
## area
library(pscl)
## Warning: package 'pscl' was built under R version 4.5.1
## Classes and Methods for R originally developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University (2002-2015),
## by and under the direction of Simon Jackman.
## hurdle and zeroinfl functions by Achim Zeileis.
The dataset consists of two linked files: one with claim frequency information and one with claim severity. I import both and join them on the policy identifier. Missing claim amounts are set to zero to represent policies with no claims.
freq <- read.csv("freMTPL2freq.csv")
sev <- read.csv("freMTPL2sev.csv")
head(freq)
## IDpol ClaimNb Exposure VehPower VehAge DrivAge BonusMalus VehBrand VehGas
## 1 1 1 0.10 5 0 55 50 B12 Regular
## 2 3 1 0.77 5 0 55 50 B12 Regular
## 3 5 1 0.75 6 2 52 50 B12 Diesel
## 4 10 1 0.09 7 0 46 50 B12 Diesel
## 5 11 1 0.84 7 0 46 50 B12 Diesel
## 6 13 1 0.52 6 2 38 50 B12 Regular
## Area Density Region
## 1 D 1217 Rhone-Alpes
## 2 D 1217 Rhone-Alpes
## 3 B 54 Picardie
## 4 B 76 Aquitaine
## 5 B 76 Aquitaine
## 6 E 3003 Nord-Pas-de-Calais
str(freq)
## 'data.frame': 678013 obs. of 12 variables:
## $ IDpol : num 1 3 5 10 11 13 15 17 18 21 ...
## $ ClaimNb : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Exposure : num 0.1 0.77 0.75 0.09 0.84 0.52 0.45 0.27 0.71 0.15 ...
## $ VehPower : int 5 5 6 7 7 6 6 7 7 7 ...
## $ VehAge : int 0 0 2 0 0 2 2 0 0 0 ...
## $ DrivAge : int 55 55 52 46 46 38 38 33 33 41 ...
## $ BonusMalus: int 50 50 50 50 50 50 50 68 68 50 ...
## $ VehBrand : chr "B12" "B12" "B12" "B12" ...
## $ VehGas : chr "Regular" "Regular" "Diesel" "Diesel" ...
## $ Area : chr "D" "D" "B" "B" ...
## $ Density : int 1217 1217 54 76 76 3003 3003 137 137 60 ...
## $ Region : chr "Rhone-Alpes" "Rhone-Alpes" "Picardie" "Aquitaine" ...
head(sev)
## IDpol ClaimAmount
## 1 1552 995.20
## 2 1010996 1128.12
## 3 4024277 1851.11
## 4 4007252 1204.00
## 5 4046424 1204.00
## 6 4073956 1204.00
str(sev)
## 'data.frame': 26639 obs. of 2 variables:
## $ IDpol : int 1552 1010996 4024277 4007252 4046424 4073956 4012173 4020812 4020812 4074074 ...
## $ ClaimAmount: num 995 1128 1851 1204 1204 ...
df <- full_join(freq, sev, by = "IDpol")
df <- df|>
mutate(ClaimAmount = replace_na(ClaimAmount, 0))
str(df)
## 'data.frame': 679708 obs. of 13 variables:
## $ IDpol : num 1 3 5 10 11 13 15 17 18 21 ...
## $ ClaimNb : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Exposure : num 0.1 0.77 0.75 0.09 0.84 0.52 0.45 0.27 0.71 0.15 ...
## $ VehPower : int 5 5 6 7 7 6 6 7 7 7 ...
## $ VehAge : int 0 0 2 0 0 2 2 0 0 0 ...
## $ DrivAge : int 55 55 52 46 46 38 38 33 33 41 ...
## $ BonusMalus : int 50 50 50 50 50 50 50 68 68 50 ...
## $ VehBrand : chr "B12" "B12" "B12" "B12" ...
## $ VehGas : chr "Regular" "Regular" "Diesel" "Diesel" ...
## $ Area : chr "D" "D" "B" "B" ...
## $ Density : int 1217 1217 54 76 76 3003 3003 137 137 60 ...
## $ Region : chr "Rhone-Alpes" "Rhone-Alpes" "Picardie" "Aquitaine" ...
## $ ClaimAmount: num 0 0 0 0 0 0 0 0 0 0 ...
To prepare the data, I filter out extreme values such as very old drivers, unusually high claim counts, and vehicles with unrealistic ages. I also convert categorical variables such as vehicle brand, fuel type, and region into factors for modeling.
df <- df |>
filter(ClaimNb <= 4) |>
filter(Exposure <= 1) |>
filter(DrivAge <= 90) |>
filter(VehAge <= 40) |>
mutate(across(c(VehBrand, VehGas, Area, Region), as.factor))
str(df)
## 'data.frame': 677581 obs. of 13 variables:
## $ IDpol : num 1 3 5 10 11 13 15 17 18 21 ...
## $ ClaimNb : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Exposure : num 0.1 0.77 0.75 0.09 0.84 0.52 0.45 0.27 0.71 0.15 ...
## $ VehPower : int 5 5 6 7 7 6 6 7 7 7 ...
## $ VehAge : int 0 0 2 0 0 2 2 0 0 0 ...
## $ DrivAge : int 55 55 52 46 46 38 38 33 33 41 ...
## $ BonusMalus : int 50 50 50 50 50 50 50 68 68 50 ...
## $ VehBrand : Factor w/ 11 levels "B1","B10","B11",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ VehGas : Factor w/ 2 levels "Diesel","Regular": 2 2 1 1 1 2 2 1 1 1 ...
## $ Area : Factor w/ 6 levels "A","B","C","D",..: 4 4 2 2 2 5 5 3 3 2 ...
## $ Density : int 1217 1217 54 76 76 3003 3003 137 137 60 ...
## $ Region : Factor w/ 21 levels "Alsace","Aquitaine",..: 21 21 18 2 2 16 16 13 13 17 ...
## $ ClaimAmount: num 0 0 0 0 0 0 0 0 0 0 ...
Before modeling, I explore the distribution of claim counts. Most policies have zero claims, with the frequency dropping sharply as the number of claims increases. This suggests strong overdispersion, motivating the use of a negative binomial or zero-inflated model for frequency.
df |>
count(ClaimNb) |>
ggplot(aes(x = factor(ClaimNb), y = n)) +
geom_col(fill = "steelblue", alpha = 0.8) +
geom_text(aes(label = scales::comma(n)),
vjust = -0.5, size = 3.5, fontface = "bold") +
labs(title = "Distribution of Claim Counts",
x = "Number of Claims", y = "Count") +
theme_minimal() +
scale_y_continuous(labels = scales::comma)
I fit a zero-inflated negative binomial (ZINB) model to account for excess zeros and overdispersion in claim counts. The model uses driver and vehicle characteristics as predictors, with exposure included as an offset.
zinb_model <- zeroinfl(
ClaimNb ~ Area + VehPower + VehAge + DrivAge + BonusMalus +
VehBrand + VehGas + Density + Region + offset(log(Exposure)) | 1,
data = df,
dist = "negbin"
)
summary(zinb_model)
## Warning in sqrt(diag(object$vcov)): NaNs produced
##
## Call:
## zeroinfl(formula = ClaimNb ~ Area + VehPower + VehAge + DrivAge + BonusMalus +
## VehBrand + VehGas + Density + Region + offset(log(Exposure)) | 1,
## data = df, dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.6705 -0.2652 -0.2088 -0.1176 118.1343
##
## Count model coefficients (negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.991e+00 1.008e-01 -39.582 < 2e-16 ***
## AreaB 5.794e-02 2.272e-02 2.550 0.010787 *
## AreaC 1.030e-01 1.884e-02 5.468 4.55e-08 ***
## AreaD 1.968e-01 1.991e-02 9.884 < 2e-16 ***
## AreaE 2.618e-01 2.125e-02 12.320 < 2e-16 ***
## AreaF 2.795e-01 4.124e-02 6.777 1.23e-11 ***
## VehPower 1.384e-02 2.882e-03 4.802 1.57e-06 ***
## VehAge -3.958e-02 1.218e-03 -32.481 < 2e-16 ***
## DrivAge 6.825e-03 4.222e-04 16.164 < 2e-16 ***
## BonusMalus 2.424e-02 3.421e-04 70.832 < 2e-16 ***
## VehBrandB10 -3.545e-03 3.821e-02 -0.093 0.926081
## VehBrandB11 9.828e-02 4.092e-02 2.402 0.016322 *
## VehBrandB12 1.869e-01 1.834e-02 10.193 < 2e-16 ***
## VehBrandB13 2.974e-02 4.267e-02 0.697 0.485812
## VehBrandB14 -1.765e-01 8.130e-02 -2.171 0.029908 *
## VehBrandB2 -1.319e-02 1.597e-02 -0.826 0.408756
## VehBrandB3 3.084e-03 2.276e-02 0.135 0.892234
## VehBrandB4 -3.599e-02 3.103e-02 -1.160 0.246002
## VehBrandB5 7.348e-02 2.597e-02 2.830 0.004656 **
## VehBrandB6 -3.833e-02 2.949e-02 -1.300 0.193575
## VehGasRegular 3.790e-02 1.141e-02 3.322 0.000894 ***
## Density -1.737e-06 NaN NaN NaN
## RegionAquitaine -8.410e-02 9.420e-02 -0.893 0.371986
## RegionAuvergne -3.263e-01 1.171e-01 -2.786 0.005337 **
## RegionBasse-Normandie -2.155e-02 9.927e-02 -0.217 0.828099
## RegionBourgogne -4.333e-02 1.015e-01 -0.427 0.669467
## RegionBretagne 5.029e-02 9.243e-02 0.544 0.586367
## RegionCentre 2.700e-02 9.102e-02 0.297 0.766735
## RegionChampagne-Ardenne 1.403e-01 1.223e-01 1.147 0.251302
## RegionCorse 2.133e-01 1.106e-01 1.929 0.053673 .
## RegionFranche-Comte -9.819e-02 1.645e-01 -0.597 0.550685
## RegionHaute-Normandie -7.706e-02 1.078e-01 -0.715 0.474644
## RegionIle-de-France -4.296e-03 9.214e-02 -0.047 0.962815
## RegionLanguedoc-Roussillon -3.644e-02 9.395e-02 -0.388 0.698140
## RegionLimousin 2.062e-01 1.109e-01 1.859 0.062960 .
## RegionMidi-Pyrenees -1.573e-01 9.846e-02 -1.597 0.110186
## RegionNord-Pas-de-Calais -1.502e-01 9.324e-02 -1.611 0.107183
## RegionPays-de-la-Loire -2.761e-02 9.301e-02 -0.297 0.766552
## RegionPicardie 2.225e-02 1.033e-01 0.215 0.829426
## RegionPoitou-Charentes -5.886e-02 9.617e-02 -0.612 0.540551
## RegionProvence-Alpes-Cotes-D'Azur 2.036e-02 9.142e-02 0.223 0.823743
## RegionRhone-Alpes 7.351e-02 9.117e-02 0.806 0.420052
## Log(theta) -6.293e-01 2.870e-02 -21.926 < 2e-16 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.9519 0.2368 -25.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta = 0.533
## Number of iterations in BFGS optimization: 156
## Log-likelihood: -1.499e+05 on 44 Df
The summary confirms that several covariates significantly influence claim frequency.
For claim severity, I restrict the dataset to positive claims only. The claim amount distribution is highly skewed, which is typical for insurance losses. I fit a Gamma GLM with a log link to model the conditional claim amount.
df_sev <- df |>
filter(ClaimAmount > 0)
str(df_sev)
## 'data.frame': 26300 obs. of 13 variables:
## $ IDpol : num 139 190 414 424 424 463 606 622 811 830 ...
## $ ClaimNb : int 1 1 1 2 2 1 1 1 1 1 ...
## $ Exposure : num 0.75 0.14 0.14 0.62 0.62 0.31 0.84 0.75 0.76 0.68 ...
## $ VehPower : int 7 12 4 10 10 5 10 5 5 4 ...
## $ VehAge : int 1 5 0 0 0 0 6 0 0 10 ...
## $ DrivAge : int 61 50 36 51 51 45 54 34 44 24 ...
## $ BonusMalus : int 50 60 85 100 100 50 50 64 50 105 ...
## $ VehBrand : Factor w/ 11 levels "B1","B10","B11",..: 4 4 4 4 4 4 4 4 4 1 ...
## $ VehGas : Factor w/ 2 levels "Diesel","Regular": 2 1 2 2 2 2 1 2 2 2 ...
## $ Area : Factor w/ 6 levels "A","B","C","D",..: 6 2 5 6 6 1 4 4 5 5 ...
## $ Density : int 27000 56 4792 27000 27000 12 583 1565 3317 3064 ...
## $ Region : Factor w/ 21 levels "Alsace","Aquitaine",..: 12 4 12 12 12 15 20 16 20 12 ...
## $ ClaimAmount: num 303 1982 1457 990 9844 ...
p1 <- ggplot(df, aes(x = ClaimAmount)) +
geom_histogram(bins = 15, fill = "blue", color = "black") +
labs(title = "Claim Amount including zero values",
x = "Claim Amount")
p2 <- ggplot(df_sev, aes(x = log(ClaimAmount))) +
geom_histogram(bins = 15, fill = "red", color = "black") +
labs(title = "Claim Amount excluding zero values",
x = "Log(Claim Amount)")
combined_plot <- p1 + p2
combined_plot
median(df_sev$ClaimAmount)
## [1] 1172
mean(df_sev$ClaimAmount)
## [1] 2267.601
var(df_sev$ClaimAmount)
## [1] 867343759
sd(df_sev$ClaimAmount)
## [1] 29450.7
sev_model <- glm(ClaimAmount ~ Area + VehPower + VehAge + DrivAge +
BonusMalus + VehBrand + VehGas + Density + Region,
data = df_sev,
family = Gamma(link = "log"))
summary(sev_model)
##
## Call:
## glm(formula = ClaimAmount ~ Area + VehPower + VehAge + DrivAge +
## BonusMalus + VehBrand + VehGas + Density + Region, family = Gamma(link = "log"),
## data = df_sev)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.538e+00 8.054e-01 8.118 4.94e-16 ***
## AreaB 1.553e-01 1.841e-01 0.844 0.39889
## AreaC -7.563e-02 1.520e-01 -0.498 0.61873
## AreaD -1.317e-02 1.611e-01 -0.082 0.93486
## AreaE -1.210e-01 2.098e-01 -0.576 0.56434
## AreaF -5.362e-01 7.292e-01 -0.735 0.46218
## VehPower 2.446e-02 2.289e-02 1.068 0.28534
## VehAge 1.885e-03 9.328e-03 0.202 0.83987
## DrivAge -3.241e-03 3.368e-03 -0.962 0.33583
## BonusMalus 7.401e-03 2.465e-03 3.003 0.00268 **
## VehBrandB10 1.469e-01 2.753e-01 0.534 0.59352
## VehBrandB11 4.382e-01 2.905e-01 1.509 0.13144
## VehBrandB12 1.177e-01 1.534e-01 0.767 0.44286
## VehBrandB13 7.219e-02 3.159e-01 0.229 0.81922
## VehBrandB14 -1.288e-01 6.189e-01 -0.208 0.83513
## VehBrandB2 1.675e-01 1.201e-01 1.394 0.16325
## VehBrandB3 2.369e-02 1.667e-01 0.142 0.88704
## VehBrandB4 1.549e-01 2.280e-01 0.679 0.49702
## VehBrandB5 -1.446e-01 1.915e-01 -0.755 0.45032
## VehBrandB6 -2.199e-01 2.155e-01 -1.020 0.30752
## VehGasRegular 1.168e-01 8.904e-02 1.312 0.18942
## Density 1.358e-05 3.000e-05 0.453 0.65084
## RegionAquitaine 3.836e-01 7.596e-01 0.505 0.61353
## RegionAuvergne 4.414e-03 9.375e-01 0.005 0.99624
## RegionBasse-Normandie 5.283e-01 8.000e-01 0.660 0.50900
## RegionBourgogne 3.337e-01 8.218e-01 0.406 0.68469
## RegionBretagne 5.122e-01 7.472e-01 0.685 0.49306
## RegionCentre 7.004e-01 7.358e-01 0.952 0.34118
## RegionChampagne-Ardenne 1.636e+00 1.081e+00 1.514 0.13001
## RegionCorse 8.357e-01 9.513e-01 0.879 0.37967
## RegionFranche-Comte 4.733e-01 1.347e+00 0.351 0.72542
## RegionHaute-Normandie 2.126e-01 8.679e-01 0.245 0.80650
## RegionIle-de-France 3.466e-01 7.471e-01 0.464 0.64270
## RegionLanguedoc-Roussillon 4.099e-01 7.624e-01 0.538 0.59088
## RegionLimousin 1.632e-01 8.851e-01 0.184 0.85372
## RegionMidi-Pyrenees 1.806e-01 8.159e-01 0.221 0.82481
## RegionNord-Pas-de-Calais 4.355e-01 7.523e-01 0.579 0.56266
## RegionPays-de-la-Loire 2.627e-01 7.498e-01 0.350 0.72608
## RegionPicardie 5.205e-01 8.283e-01 0.628 0.52972
## RegionPoitou-Charentes 2.370e-01 7.708e-01 0.307 0.75850
## RegionProvence-Alpes-Cotes-D'Azur 6.395e-01 7.392e-01 0.865 0.38702
## RegionRhone-Alpes 5.654e-01 7.367e-01 0.768 0.44277
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 48.71545)
##
## Null deviance: 46301 on 26299 degrees of freedom
## Residual deviance: 43061 on 26258 degrees of freedom
## AIC: 454550
##
## Number of Fisher Scoring iterations: 14
The results show that driver and vehicle characteristics also explain some variation in claim severity.
df$pred_freq <- predict(zinb_model, type = "response")
df <- df |>
mutate(pred_decile = ntile(pred_freq, 10))
calibration <- df |>
group_by(pred_decile) |>
summarise(
avg_pred = mean(pred_freq),
avg_actual = mean(ClaimNb),
.groups = "drop"
)
ggplot(calibration, aes(x = avg_pred, y = avg_actual)) +
geom_point(size = 3, color = "blue") +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "red") +
labs(title = "Calibration of Frequency Model (Deciles)",
x = "Average Predicted Frequency",
y = "Average Actual Frequency") +
theme_minimal()
To evaluate the models, I compare predicted vs. observed outcomes. For frequency, I assess calibration across deciles of predicted risk. For severity, I compare predicted amounts to observed claim amounts. While the models are not perfectly accurate, they capture general trends in the data.
df_sev$pred_sev <- predict(sev_model, type = "response")
ggplot(df_sev, aes(x = pred_sev, y = ClaimAmount)) +
geom_point(alpha = 0.3, color = "darkred") +
xlim(0, 12500)+
ylim(0, 12500)+
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "black") +
labs(title = "Observed vs Predicted Severity (Gamma GLM)",
x = "Predicted Claim Amount", y = "Observed Claim Amount") +
theme_minimal()
## Warning: Removed 363 rows containing missing values or values outside the scale range
## (`geom_point()`).
df$pred_freq <- predict(zinb_model, newdata = df, type = "response")
df_sev$pred_sev <- predict(sev_model, newdata = df_sev, type = "response")
df_sev_unique <- df_sev |>
group_by(IDpol) |>
summarise(pred_sev = mean(pred_sev, na.rm = TRUE))
df <- left_join(df, df_sev_unique, by = "IDpol")
mean_sev <- mean(df$pred_sev, na.rm = TRUE)
df$pred_sev <- ifelse(is.na(df$pred_sev), mean_sev, df$pred_sev)
df$pure_premium <- df$pred_freq * df$pred_sev
loading <- 0.25
df$premium <- df$pure_premium * (1 + loading)
head(df[, c("IDpol", "ClaimNb", "ClaimAmount", "pred_freq", "pred_sev", "pure_premium", "premium")])
## IDpol ClaimNb ClaimAmount pred_freq pred_sev pure_premium premium
## 1 1 1 0 0.015820810 2232.942 35.32695 44.15869
## 2 3 1 0 0.121820235 2232.942 272.01754 340.02193
## 3 5 1 0 0.086869602 2232.942 193.97480 242.46850
## 4 10 1 0 0.009872851 2232.942 22.04551 27.55688
## 5 11 1 0 0.092146613 2232.942 205.75806 257.19757
## 6 13 1 0 0.058369939 2232.942 130.33670 162.92087
With frequency and severity models built, I combine them to compute the pure premium for each policyholder. The pure premium is the expected claim cost, equal to predicted frequency multiplied by predicted severity. A 25% loading factor is then added to obtain the final premium.
ggplot(df, aes(x = pure_premium)) +
geom_histogram(bins = 50, fill = "steelblue", color = "black") +
labs(title = "Distribution of Predicted Pure Premiums",
x = "Pure Premium", y = "Number of Policyholders") +
xlim(0, 500)
## Warning: Removed 8174 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).
theme_minimal()
## List of 136
## $ line :List of 6
## ..$ colour : chr "black"
## ..$ linewidth : num 0.5
## ..$ linetype : num 1
## ..$ lineend : chr "butt"
## ..$ arrow : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ rect :List of 5
## ..$ fill : chr "white"
## ..$ colour : chr "black"
## ..$ linewidth : num 0.5
## ..$ linetype : num 1
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ text :List of 11
## ..$ family : chr ""
## ..$ face : chr "plain"
## ..$ colour : chr "black"
## ..$ size : num 11
## ..$ hjust : num 0.5
## ..$ vjust : num 0.5
## ..$ angle : num 0
## ..$ lineheight : num 0.9
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ title : NULL
## $ aspect.ratio : NULL
## $ axis.title : NULL
## $ axis.title.x :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 2.75points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.x.top :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 2.75points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.x.bottom : NULL
## $ axis.title.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : num 90
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.75points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.y.left : NULL
## $ axis.title.y.right :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : num -90
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 2.75points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : chr "grey30"
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 2.2points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x.top :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 2.2points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x.bottom : NULL
## $ axis.text.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 1
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.2points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.y.left : NULL
## $ axis.text.y.right :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 2.2points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.theta : NULL
## $ axis.text.r :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0.5
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.2points 0points 2.2points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.ticks : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.ticks.x : NULL
## $ axis.ticks.x.top : NULL
## $ axis.ticks.x.bottom : NULL
## $ axis.ticks.y : NULL
## $ axis.ticks.y.left : NULL
## $ axis.ticks.y.right : NULL
## $ axis.ticks.theta : NULL
## $ axis.ticks.r : NULL
## $ axis.minor.ticks.x.top : NULL
## $ axis.minor.ticks.x.bottom : NULL
## $ axis.minor.ticks.y.left : NULL
## $ axis.minor.ticks.y.right : NULL
## $ axis.minor.ticks.theta : NULL
## $ axis.minor.ticks.r : NULL
## $ axis.ticks.length : 'simpleUnit' num 2.75points
## ..- attr(*, "unit")= int 8
## $ axis.ticks.length.x : NULL
## $ axis.ticks.length.x.top : NULL
## $ axis.ticks.length.x.bottom : NULL
## $ axis.ticks.length.y : NULL
## $ axis.ticks.length.y.left : NULL
## $ axis.ticks.length.y.right : NULL
## $ axis.ticks.length.theta : NULL
## $ axis.ticks.length.r : NULL
## $ axis.minor.ticks.length : 'rel' num 0.75
## $ axis.minor.ticks.length.x : NULL
## $ axis.minor.ticks.length.x.top : NULL
## $ axis.minor.ticks.length.x.bottom: NULL
## $ axis.minor.ticks.length.y : NULL
## $ axis.minor.ticks.length.y.left : NULL
## $ axis.minor.ticks.length.y.right : NULL
## $ axis.minor.ticks.length.theta : NULL
## $ axis.minor.ticks.length.r : NULL
## $ axis.line : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.line.x : NULL
## $ axis.line.x.top : NULL
## $ axis.line.x.bottom : NULL
## $ axis.line.y : NULL
## $ axis.line.y.left : NULL
## $ axis.line.y.right : NULL
## $ axis.line.theta : NULL
## $ axis.line.r : NULL
## $ legend.background : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.margin : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
## ..- attr(*, "unit")= int 8
## $ legend.spacing : 'simpleUnit' num 11points
## ..- attr(*, "unit")= int 8
## $ legend.spacing.x : NULL
## $ legend.spacing.y : NULL
## $ legend.key : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.key.size : 'simpleUnit' num 1.2lines
## ..- attr(*, "unit")= int 3
## $ legend.key.height : NULL
## $ legend.key.width : NULL
## $ legend.key.spacing : 'simpleUnit' num 5.5points
## ..- attr(*, "unit")= int 8
## $ legend.key.spacing.x : NULL
## $ legend.key.spacing.y : NULL
## $ legend.frame : NULL
## $ legend.ticks : NULL
## $ legend.ticks.length : 'rel' num 0.2
## $ legend.axis.line : NULL
## $ legend.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.text.position : NULL
## $ legend.title :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.title.position : NULL
## $ legend.position : chr "right"
## $ legend.position.inside : NULL
## $ legend.direction : NULL
## $ legend.byrow : NULL
## $ legend.justification : chr "center"
## $ legend.justification.top : NULL
## $ legend.justification.bottom : NULL
## $ legend.justification.left : NULL
## $ legend.justification.right : NULL
## $ legend.justification.inside : NULL
## $ legend.location : NULL
## $ legend.box : NULL
## $ legend.box.just : NULL
## $ legend.box.margin : 'margin' num [1:4] 0cm 0cm 0cm 0cm
## ..- attr(*, "unit")= int 1
## $ legend.box.background : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.box.spacing : 'simpleUnit' num 11points
## ..- attr(*, "unit")= int 8
## [list output truncated]
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi TRUE
## - attr(*, "validate")= logi TRUE
The distribution of predicted pure premiums shows that most policyholders are expected to have relatively low annual claim costs, but a small proportion of policies are associated with much higher expected costs. This matches the typical risk profile observed in insurance portfolios.
In this project, I applied frequency-severity modeling to automobile insurance data. A zero-inflated negative binomial model was used for claim frequency, and a Gamma GLM was used for claim severity. Together, these models allowed me to estimate pure premiums and visualize the distribution of expected risk across the portfolio. While the models could be improved with additional feature engineering or alternative distributions, they demonstrate a standard actuarial approach to pricing and risk modeling.