Step 3 – Modeling and Comparing Performance

Croudify Model outperforms any Lending Club model by 2% (+40% net)


This is the last blog in this series. The 1st blog was written  here and the 2nd blog is here.

In this blog we will talk about our modeling approach and compare the performance of the model to the average returns for various ratings on the Lending Club platform.

For our model we use a combination of XGBoost and stacking techniques.

While gradient boosting is well defined stacking is something that is proprietary to our modeling and provides us improved return discrimination while choosing the loans.

Stacking Defined

One way of thinking about stacking is that an additional level of analysis is added to the machine learning algorithm. Thus, some call it a meta-algorithm or meta-ensembling. The data is separated into several models and each model is then run independently of each other. Then, the results of each model are combined to arrive to a final result. It is worth noting that this separation into subsets is NOT random, it is based on a discretionary rule.

Croudify has found that in our particular case, this has added these specific benefits:

  1. Parallelism. As the data can be separately analyzed, we are able to analyze the data much quicker than a single model can do. Usually, the increase in speed is linear with regards to the number of models introduced. If enough analytical resources are available, we do see separating the data into 8 models leads to a 8 times faster time than a single model.
  2. Decrease in variance of prediction.  As the data that does not pertain to a particular subset is not used, it is discarded and thus, it does not introduce additional variance. We have found that some loans do not depend on a number of different variables that others do. This improves the predictive power of the models compared to a single model that takes into account all the variables in a single run.

Stacking is one of those techniques that does not come pre-encoded in a library in your language of choice. It is a technique that a data scientist has to asses and apply with experience. In our case, it has improved our assessment of risk defaults across all loans we have studied.

Stacking Implementation

In our case we stacked the models based on Lending Club loan ratings. The benefit of stacking against the ratings provided us a benefit of choosing the best loans within a pool that provided similar returns. Thus stacking in a single ratings gives us a chance to improve our returns compared to the average portfolio.

Model Assumptions

To compare our output with the average we took top 10 % of the loans that our model predicted as the best investment option for a given rating pool. We compared it to the average return for that rating across time. Other things include:

  1. Vintages analyzed: The comparison is for Loans originated from 2015 Q1 to 2017Q2 (this provided us broad enough data though recent vintages have not seasoned)
  2. No reinvestment income : In our analysis we did not do any reinvestment of the returns, so our returns are not compounded as it would happen for a normal investor
  3. Top 10% loans: For every bucket we simulated investment in top 10% of the loans as predicted by our model. The top 10% usually presented more than $5 MM invested for each bucket, thus giving us confidence that we can give similar returns for a large number of clients and also while competing for the loans with other investors.
  4. All Ratings modeled: While in our previous blog we concluded that it is profitable only to invest in the A, B & C rating loans we modeled all the ratings and compared our results. For top 10% the returns might not be good for D& E but if we move to top 1% or 2% the results might be great. So we analyze those and will look at deploying loans in those ratings for large portfolios where we can deploy 2-5% of capital in high risk loans
  5. Monthly returns are averaged: The Graph we have presented are the average return graphs for monthly payments (for for month 1 we have averaged returns for all the vintages, for month 20th only the vintages that have 20 payments are averaged)
  6. Partial A & B ratings: For A rating bucket we modeled A3, A4 & A5 only loans similarly for B we only modeled B3,B4 & B5 loans. These two ratings have huge overall buckets and low default rates so going for gross higher interest rates gave us an opportunity to get better returns

Model Returns

Below graphs show the average returns for Lending Club Ratings and our Modeled returns for the chosen top 10%  decile

As you can clearly see our selected top 10%  loans consistently outperform anywhere between 0.5% to 2.5%.



If you are a normal investor looking to invest in Lending Club, analyzing each loan is hard and investing and reinvesting takes time. On the other hand using platform like Croudify you can not only automate the whole investing process but you can also achieve higher returns compared to average returns that you can expect from a lending club automation engine.

Disclaimer:  LendingClub Notes are offered by prospectus filed with the SEC and you should review the risks and uncertainties described in the prospectus prior to investing in the Notes. Croudify is not a registered investment adviser, and the information provided is not intended as investment, legal, or tax advice. LendingClub Notes are not insured or guaranteed and investors may have negative returns. Historical Returns are not a promise of future results. Consult with your investment or financial advisor prior to investing.

Also published on Medium.

One thought on “Step 3 – Modeling and Comparing Performance

Leave a Reply

Your email address will not be published. Required fields are marked *