How to evaluate and identify the potential default risk of the borrower before issuing the loan and calculate the default probability of the borrower is the basis and important link of the credit risk management of modern financial institutions. This paper mainly studies the statistical analysis of historical loan data of banks and other financial institutions using the idea of non-equilibrium data classification, and uses a random forest algorithm to establish a loan default prediction model. The experimental results show that the random forest algorithm surpasses the decision tree and the logistic regression classification algorithm in the prediction performance. In addition, by using the random forest algorithm to rank the importance of features, it is possible to obtain features that have a greater impact on the eventual default, so that it can more effectively determine the risk of lending in the financial sector.