Last time we discovered the structure of the data from the insurance industry using clustering. We found that there are channels within the industry. Now I execute a Fast Clustering procedure, which is basically the K-means clustering, in SAS. But I set the maximum number of clusters as 50. And I plot the clusters I got from that.
Actually this plot stands for two metrics. We have 50 Gs and 50 Rs here. The G points are in the coordinate system of frequency and Gap, which is the distance to the nearest cluster. The R points are in the coordinate system of frequency and Radius, which is the distance from a cluster centroid to most distant
case in that cluster. This procedure could help us identify the outliers in the dataset. Usually the cluster with a small number of instances is on suspicion of being an outlier, especially when this cluster is far away from the other clusters. For example, the upper left G cluster is a potential outlier in a sense that there are merely points in that cluster and it is a cluster that is so far away from others.
So I started to think cutting some outlier-like instances. I eliminated the clusters that had less than 5 instances in it. I restricted the maximum number of clusters to five and reran the fast clustering procedure. I compared the mean values of attributes of each clusters and summarized the feature that the attributes had in common. Here is what I got:
cluster | Return | Product | Size | Capital Adequecy |
1 | high | health | moderate | high |
2 | moderate/high | life | large | moderate |
3 | medium | annuity | huge | moderate |
4 | negative | reinsurance | moderate | moderate |
5 | break-even | life | small | least |
Knowing that insurance companies are divided by channels is not a breakthrough. But what if I got some insights about these channels? Some interesting findings:
1. The size of insurance companies varies. The biggest ones were those who play with annuity. This is simply because annuity is something one pays for social security for the rest of one's life.
2.Reinsurance companies had negative return. They didn't do well in 2011, which is consistent with the reality.
3. Health insurance companies had highest capital adequacy, showing they tended to have short term investments. This is understandable because they need to keep their capital rolling over and over so that they could pay claims as needed.
4. Life insurance companies had the least capital adequacy. I think the reason lies in the fact that people would only receive the claim when the insured are dead. So life insurance companies tended to have long term investments. Furthermore, the investment of part of them (cluster 5) went to reinsurance. Given the situation the reinsurers were faced with, it's not surprising that this group of life insurance companies just got break-even. While the insurance companies that chose other investment alternatives (cluster 2) seem achieving moderate/high return, which is much more satisfying.
5. Why did life insurance companies choose different investment targets? My answer would be the difference in size. Small firms didn't have enough investment-savvy professionals so they had to give their capital to companies that had the ability to identify better investment opportunities, aka the reinsurance companies. So they suffered from the reinsurance difficulties. At the same time, since big life insurers had the ability to invest independently, they could achieve profitability.
I think this is a happy ending besides there are more that could be done before I could fully understand this industry. But my larger point here is that I just found insights intriguing enough from massive data. Now I am very confident to say analytics is going to drive us from an information age to an insight era.