Step 2 – Sampling Data for Modeling

Choosing the right sample is the first step in successful modeling

This blog is part 2 of the 3 part series . The first blog was written here.


Now that we have benchmark the data to validate its completeness and accuracy. The next step in our journey towards finding returns is find the loan term and Loan ratings that we should invest in to get alpha in returns (alpha – higher return (sometimes called unnatural) compared to risk of an average portfolio).

Term Analysis

The term selection decision was simple there are only 2 type of terms available on Lending Club, 36 months and 60 months. On an average the 60 month loans provide a higher return for any given category. The comparison for 2014 vintage for both the terms is below


As you would expect the 60 month term loans gives a superior return for all the vintages (higher return for longer term). Even if you compare any rating across time it constantly outperforms the 36 month vintage

But if you look closely the volume of loans for 60 month term are relatively lower compared to 36 month loans for higher ratings.


Actually nearly 93% of A ratings loans comes from 36 month loans. Similarly for B & C ratings a higher proportion comes from 36 month loans.

Since our model is able to discriminate between loans in a certain rating (next blog in this series) and we are able to find better loans in a population, a bigger population always provides a better fit and we are able to find better returns with larger populations. Thus if we are going to choose the loans in Ratings A, B or C it would make sense to choose from 36 term loans.

Ratings Analysis

If you look at Table 1 at top you can see that the returns peak at C ratings for D & E ratings the average returns are lower compared to B & A ratings. This is not only true for the 2014 vintage but this is true if you take all Lending Club loans life to date.

As you can see from the table below C E E rating loans as they mature (>18 months age) give lower and lower returns. For E the returns are negative for most of the quarters.

 TABLE 3 – NAR returns for 36 month Vintages

looking at all this data we concluded that we should only invest in A (A3,A4,A5), B (B3,B4,B5) & C rating loans 

Once the ratings were decided the term choice also became clear (look at term analysis) we chose 36 month loans for our model portfolios.

The chosen Ratings and term also provided few additional advantages for stress scenarios (in case economy starts tanking)

Benefits of choosing shorter term and higher rating loans

  1. Better downside protection: Since the loans in top categories are highest quality they are expected to perform better than the lower credit quality loans in case of downside.
  2. Fasted Return of Principal: Similarly, with only 36 month loans in our portfolio we were certain that we will get back our principal faster than the 60 term loans. This was of an added importance in case of increased defaults with a sudden dip in economy.

In our next but last blog we will show how our models performed when we are able to find higher rating loans in a specific rating loan, thus allowing us to get alpha in our returns.

Also published on Medium.

2 thoughts on “Step 2 – Sampling Data for Modeling

Leave a Reply

Your email address will not be published. Required fields are marked *