--- title: "Introduction to creditmodel" date: "`r Sys.Date()`" Author: "Dongping Fan" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Introduction to creditmodel} output: knitr:::html_vignette: toc: yes --- ```{r setup, include = FALSE} knitr::opts_chunk$set(echo = FALSE) library(creditmodel) ``` ## Introduction The `creditmodel` package provides a highly efficient R tool suite for Credit Modeling Analysis and Visualization. Contains infrastructure functionalities such as data exploration and preparation, missing values treatment, outliers treatment, variable derivation, variable selection, dimensionality reduction, grid search for hyper parameters, data mining and visualization, model evaluation, strategy analysis etc. creditmodel can facilitate reliable predictive models (such as xgboost or scorecard) and data analysis on a standard laptop computer within minutes. This introductory vignette provides a brief glance at the training_model module of the package. When I first wrote the creditmodel package, its primary purpose was to provide a tool to make the development of binary classification models (machine learning based models as well as credit scorecard) simpler and faster. Therefore, I wrote the package to automatically build model. However, as the package grew in functionality, this choice was increasingly problematic. Importantly, the creditmodel package now provides a set of complementary tools with different missions. ## Quick Modeling Now, Let's start with quick modeling. ```{r fig.width = 10} B_model = training_model(dat = UCICreditCard, model_name = "UCICreditCard", target = "default.payment.next.month", x_list = NULL, occur_time = "apply_date", obs_id = "ID", dat_test = NULL, preproc = TRUE, miss_values = list(-1, -2,"\"\"\"\"",""), missing_proc = TRUE, outlier_proc = TRUE, trans_log = TRUE, feature_filter = list(filter = c("IV", "COR"), cv_folds = 1, iv_cp = 0.01, psi_cp = 0.2, cor_cp = 0.7, xgb_cp = 0, hopper = F), vars_plot = FALSE, algorithm = list("LR"), breaks_list = NULL, LR.params = lr_params( iter = 2, method = 'random_search', tree_control = list(p = 0.02, cp = c(0.00001, 0.00000001), xval = 5, maxdepth = c(10, 15)), bins_control = list(bins_num = 10, bins_pct = c(0.02, 0.03, 0.05), b_chi = c(0.01, 0.02, 0.03), b_odds = 0.1, b_psi = c(0.02, 0.06), b_or = c(.05, 0.1, 0.15, 0.2), mono = c(0.1, 0.2, 0.4, 0.5), odds_psi = c(0.1, 0.15, 0.2), kc = 1), f_eval = 'ks', lasso = TRUE, step_wise = FALSE), parallel = FALSE, cores_num = NULL, save_pmml = FALSE, plot_show = TRUE, model_path = tempdir(), seed = 46) ``` In a few minutes, the program completed data cleaning and pretreatment, variable screening, scorecard, Xgboost, GBDT, RandomForest four models development and evaluation.