-
2024-02-15스파르타/TIL(Today I Learned) 2024. 2. 16. 02:18
프로젝트로 인하여 내용 나중에 적을 예정
->
(추가해 줌)
더보기추가 아이디어
스케일링도 여러가지 있는 듯해서 여유되면 다른 스케일러 찾아서 해보기
인코딩도 확인할 수있으면 추가로 비교해서 확인해보기
그리고 여테 각 변수의 범주별 등급분포를 봤는데
등급별 각 변수 분포도 반대로 확인해봐야 할듯
파생 변수
voting from sklearn.ensemble import VotingClassifier
- 학습시킨 모델 파일로 저장하기~~from sklearn.externals import joblib~~ import joblib
- joblib.dump(clf, 'model.pkl', compress=3)
- import sys
scoring 커스텀으로 만들어서 중간 진행도 확인하기
tqdm으로 진행도 파악하기 pip install tqdm from tqdm import tqdm
히트맵 그리기 함수화해두기(파생변수 추가되었을 때도 추가적으로 새로 그리면 좋을 것 같아서)
더보기코드 패치노트
슬슬 이날 부터는 빠르게 빠르게 해준다고 코드 수정부분에 대하여 어떻게 수정하였는지 기록을 남겨주지 못하였다
(적어둔 부분도 크게 중요한 부분은 아니라 생각되어 생략하도록 하겠다, z-score기반으로 이상치 제거하는 부분이 나름 중요하긴 하나 나중에 프로젝트 관련하여 내용에 포함될 예정이기 때문에 지금 이시점에 추가해줬다 이게 중요하진 않을 것이라 판단되어 중요하지 않다고 생각하였다)
더보기각 모델 하이퍼 파라미터 기본값
sklearn.model_selection.GridSearchCV
sklearn.model_selection.RandomizedSearchCV
- dt
- sklearn.tree.DecisionTreeClassifier
- The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- 트리의 최대 깊이입니다. 없음인 경우 모든 잎이 순수하거나 모든 잎이 min_samples_split 샘플보다 작은 표본을 포함할 때까지 노드가 확장됩니다.
- max_depth none으로 하면
- from sklearn.tree import DecisionTreeClassifier
- rf
- sklearn.ensemble.RandomForestClassifier
- n_estimatorsint, default=100
- The number of trees in the forest.
- max_depthint, default=None
- The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- 트리의 최대 깊이입니다. 없음인 경우 모든 잎이 순수하거나 모든 잎이 min_samples_split 샘플보다 작은 표본을 포함할 때까지 노드가 확장됩니다.
- from sklearn.ensemble import RandomForestClassifier
- et
- sklearn.ensemble.ExtraTreesClassifier
- n_estimatorsint, default=100
- The number of trees in the forest.
- max_depthint, default=None
- The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- 트리의 최대 깊이입니다. 없음인 경우 모든 잎이 순수하거나 모든 잎이 min_samples_split 샘플보다 작은 표본을 포함할 때까지 노드가 확장됩니다.
- from sklearn.ensemble import ExtraTreesClassifier
- xgb
- Python API Reference — xgboost 2.1.0-dev documentation• max_depth (Optional[int]) – Maximum tree depth for base learners.
- • n_estimators (Optional[int]) – Number of boosting rounds.
- from xgboost import XGBClassifier
더보기중간 결과물
(너무 긴 부분도 있어서 일부는 날림)
- 1차적 모델끼리 정확도 f1_macro로 평가한 점수확인 →dt, rf, xgb,
lgb,et스케일링은 sc_col = ['연간소득', '부채_대비_소득_비율'] 나머지는 mm대출목적, 주택소유상태- 결과'model_dt': {'fit_time': array([0.66260242, 0.67761564, 0.70664191, 0.66460276, 0.67299581]), 'score_time': array([0.01201081, 0.01201034, 0.01100993, 0.01201081, 0.01201224]), 'test_accuracy': array([0.83603302, 0.83927265, 0.83968229, 0.83550191, 0.84307885]), 'train_accuracy': array([1., 1., 1., 1., 1.]), 'test_f1_macro': array([0.77276985, 0.77618444, 0.76610055, 0.7559306 , 0.77350753]), 'train_f1_macro': array([1., 1., 1., 1., 1.])},'model_knn': {'fit_time': array([0.01501346, 0.01501393, 0.01401353, 0.01501298, 0.01401305]), 'score_time': array([1.42609739, 1.42329311, 1.39426684, 1.34873176, 1.37875271]), 'test_accuracy': array([0.35270143, 0.35390323, 0.34728536, 0.34927105, 0.35130898]), 'train_accuracy': array([0.55118943, 0.54659107, 0.55038537, 0.54952319, 0.55028086]), 'test_f1_macro': array([0.23419952, 0.24146593, 0.23105777, 0.23069666, 0.2252951 ]), 'train_f1_macro': array([0.40529019, 0.39838383, 0.40772596, 0.40435698, 0.4075273 ])},'model_xgb': {'fit_time': array([1.17260599, 1.24229264, 1.23724437, 1.15693688, 1.2334466 ]), 'score_time': array([0.05905128, 0.04659295, 0.04604173, 0.05005455, 0.04704237]), 'test_accuracy': array([0.83378618, 0.83133034, 0.83184407, 0.83618122, 0.83821916]), 'train_accuracy': array([0.88922128, 0.88961319, 0.88757675, 0.8901241 , 0.88974526]), 'test_f1_macro': array([0.77543692, 0.77986058, 0.77048389, 0.76905818, 0.78055823]), 'train_f1_macro': array([0.90812768, 0.90799494, 0.90740538, 0.90669811, 0.90768593])},'model_et': {'fit_time': array([9.43380547, 9.42943907, 9.37981987, 9.40974283, 9.49166727]), 'score_time': array([0.57652903, 0.57800269, 0.59411597, 0.57307863, 0.58021045]), 'test_accuracy': array([0.68758491, 0.69014526, 0.68328369, 0.68265663, 0.6882479 ]), 'train_accuracy': array([1., 1., 1., 1., 1.]), 'test_f1_macro': array([0.5710648 , 0.58411625, 0.56415031, 0.55996858, 0.55780156]), 'train_f1_macro': array([1., 1., 1., 1., 1.])}}
- 'model_lgb': {'fit_time': array([1.01751494, 0.99526429, 0.98273134, 0.99707556, 1.02645135]), 'score_time': array([0.13913178, 0.14463401, 0.14013529, 0.14613295, 0.14415145]), 'test_accuracy': array([0.82516459, 0.82349253, 0.82275174, 0.82604379, 0.82944035]), 'train_accuracy': array([0.85880939, 0.86040314, 0.85851078, 0.85951666, 0.86036577]), 'test_f1_macro': array([0.76025329, 0.76941381, 0.76254294, 0.7453647 , 0.77205294]), 'train_f1_macro': array([0.88099242, 0.88296013, 0.88113644, 0.88274915, 0.88222023])},
- 'model_gbm': {'fit_time': array([81.38249898, 80.86995912, 80.98467851, 81.07965398, 81.31905246]), 'score_time': array([0.1411283 , 0.14513254, 0.14203811, 0.14163327, 0.14203835]), 'test_accuracy': array([0.71120284, 0.70759745, 0.71118775, 0.70998589, 0.71160579]), 'train_accuracy': array([0.72262211, 0.7242028 , 0.72519922, 0.72305683, 0.72314827]), 'test_f1_macro': array([0.64601324, 0.66192797, 0.65922547, 0.66383058, 0.6660106 ]), 'train_f1_macro': array([0.72160171, 0.71705946, 0.72559226, 0.72397636, 0.72279175])},
- 'model_rf': {'fit_time': array([17.61120749, 17.63870502, 17.9050355 , 17.63789248, 18.32859588]), 'score_time': array([0.42438555, 0.43223834, 0.43039751, 0.44506025, 0.43639684]), 'test_accuracy': array([0.78153412, 0.78284042, 0.77723781, 0.77676752, 0.7803731 ]), 'train_accuracy': array([1., 1., 1., 1., 1.]), 'test_f1_macro': array([0.6583028 , 0.66863208, 0.66285725, 0.64895749, 0.65537035]), 'train_f1_macro': array([1., 1., 1., 1., 1.])},
- {'model_lor': {'fit_time': array([1.38506365, 1.33310032, 1.28650403, 1.3001821 , 1.28016329]), 'score_time': array([0.00900769, 0.00800776, 0.00900793, 0.00900817, 0.00800753]), 'test_accuracy': array([0.54080886, 0.53709897, 0.53796311, 0.53738831, 0.54041908]), 'train_accuracy': array([0.53557852, 0.53681955, 0.54283475, 0.5407838 , 0.53712606]), 'test_f1_macro': array([0.34887846, 0.33862998, 0.34855732, 0.34131534, 0.34558651]), 'train_f1_macro': array([0.34313179, 0.34238225, 0.3483373 , 0.34593599, 0.34224702])},
- be_cols = ['대출목적','주택소유상태'] oe_cols = ['대출기간', '근로기간']
- target_dict = {'A': 0, 'B': 1, 'C': 2, 'D': 3, 'E': 4, 'F': 5, 'G': 6}와
- 그냥 간단히 z-score로 이상치 제거만 해주고 나머지는 크게 안건드리고
- 2차 → 결과기록에선 이것을 1차로 잡음
- 1Model: model_rf Best Parameters: {'max_depth': 7, 'n_estimators': 50} Best F1-macro Score: 0.32920444519060793 Best Accuracy Score: 0.5361751755672899 Best Score: 0.32920444519060793Model: model_xgb Best Parameters: {'max_depth': 7, 'n_estimators': 200} Best F1-macro Score: 0.7938732862998641 Best Accuracy Score: 0.856448623065963 Best Score: 0.7938732862998641
- Model: model_et Best Parameters: {'max_depth': 7, 'n_estimators': 50} Best F1-macro Score: 0.17016087102660077 Best Accuracy Score: 0.38641615404789836 Best Score: 0.17016087102660077
- Model: model_dt Best Parameters: {'max_depth': 7} Best F1-macro Score: 0.5751310415573679 Best Accuracy Score: 0.6367217779727371 Best Score: 0.5751310415573679
- 2
- {'model_dt': {'best_params': {'max_depth': 7}, 'best_f1_score': 0.5755512911533535, 'best_accuracy': 0.6367322289316126, 'best_scores': 0.5755512911533535}, 'model_rf': {'best_params': {'max_depth': 7, 'n_estimators': 100}, 'best_f1_score': 0.32874450898130936, 'best_accuracy': 0.5378054945711506, 'best_scores': 0.32874450898130936}, 'model_et': {'best_params': {'max_depth': 7, 'n_estimators': 50}, 'best_f1_score': 0.17182739237333444, 'best_accuracy': 0.3866355968800748, 'best_scores': 0.17182739237333444}, 'model_xgb': {'best_params': {'max_depth': 7, 'n_estimators': 200}, 'best_f1_score': 0.7938732862998641, 'best_accuracy': 0.856448623065963, 'best_scores': 0.7938732862998641}}
- max_depth : 3,5,7 / n_estimators 50,100, 200
- 3차Model: model_dt Optimal number of features for model_dt : 3 Features sorted by their rank: [(14, '주택소유상태_0'), (13, '연체계좌수'), (12, '총연체금액'), (11, '대출목적_3'), (10, '대출목적_0'), (9, '대출목적_1'), (8, '주택소유상태_1'), (7, '대출목적_2'), (6, '주택소유상태_2'), (5, '최근_2년간_연체_횟수'), (4, '총계좌수'), (3, '연간소득'), (2, '부채_대비_소득_비율'), (1, '총상환이자'), (1, '총상환원금'), (1, '대출금액')] Model: model_rf Optimal number of features for model_rf : 3 Features sorted by their rank: [(14, '주택소유상태_0'), (13, '총연체금액'), (12, '연체계좌수'), (11, '대출목적_0'), (10, '대출목적_3'), (9, '대출목적_1'), (8, '대출목적_2'), (7, '주택소유상태_1'), (6, '주택소유상태_2'), (5, '최근_2년간_연체_횟수'), (4, '총계좌수'), (3, '연간소득'), (2, '부채_대비_소득_비율'), (1, '총상환이자'), (1, '총상환원금'), (1, '대출금액')] Model: model_et
- 4차
- 다시 간단히 해준 것 15 232137
-
- 데이터
- {'model_dt': {'fit_time': array([0.70492125, 0.68269444, 0.68830848, 0.69556308, 0.70111775]), 'score_time': array([0.01301217, 0.01201129, 0.01201129, 0.01301169, 0.01301122]), 'test_accuracy': array([0.83603302, 0.83927265, 0.83968229, 0.83550191, 0.84307885]), 'train_accuracy': array([1., 1., 1., 1., 1.]), 'test_f1_macro': array([0.77276985, 0.77618444, 0.76610055, 0.7559306 , 0.77350753]), 'train_f1_macro': array([1., 1., 1., 1., 1.])}, 'model_rf': {'fit_time': array([18.67404246, 18.45873594, 18.94830513, 18.87328792, 18.79281425]), 'score_time': array([0.48928714, 0.50013638, 0.50470519, 0.47235608, 0.5300293 ]), 'test_accuracy': array([0.78153412, 0.78284042, 0.77723781, 0.77676752, 0.7803731 ]), 'train_accuracy': array([1., 1., 1., 1., 1.]), 'test_f1_macro': array([0.6583028 , 0.66863208, 0.66285725, 0.64895749, 0.65537035]), 'train_f1_macro': array([1., 1., 1., 1., 1.])}, 'model_xgb': {'fit_time': array([1.78613806, 1.3886559 , 1.43333077, 1.39674878, 1.34752297]), 'score_time': array([0.05706191, 0.05404925, 0.04704309, 0.05455208, 0.04804325]), 'test_accuracy': array([0.83378618, 0.83133034, 0.83184407, 0.83618122, 0.83821916]), 'train_accuracy': array([0.88922128, 0.88961319, 0.88757675, 0.8901241 , 0.88974526]), 'test_f1_macro': array([0.77543692, 0.77986058, 0.77048389, 0.76905818, 0.78055823]), 'train_f1_macro': array([0.90812768, 0.90799494, 0.90740538, 0.90669811, 0.90768593])}, 'model_et': {'fit_time': array([10.67007947, 10.74230504, 10.38943934, 10.36898971, 10.39428568]), 'score_time': array([0.88480425, 0.6525929 , 0.65832424, 0.64709187, 0.69310451]), 'test_accuracy': array([0.68758491, 0.69014526, 0.68328369, 0.68265663, 0.6882479 ]), 'train_accuracy': array([1., 1., 1., 1., 1.]), 'test_f1_macro': array([0.5710648 , 0.58411625, 0.56415031, 0.55996858, 0.55780156]), 'train_f1_macro': array([1., 1., 1., 1., 1.])}}
- dt와 xgb만 제출
- 다시 간단히 해준것15 232137
- 5차
- 다시 grid만 해준 것
-
- 데이터Model: model_rf Best Parameters: {'max_depth': 62, 'n_estimators': 300} Best F1-macro Score: 0.6665468163184007 Best Accuracy Score: 0.7860942079402135 Best Score: 0.6665468163184007Model: model_xgb Best Parameters: {'max_depth': 64, 'n_estimators': 50} Best F1-macro Score: 0.8096816271500844 Best Accuracy Score: 0.8631893746786329 Best Score: 0.8096816271500844
- Model: model_et Best Parameters: {'max_depth': 65, 'n_estimators': 930} Best F1-macro Score: 0.5802458041513825 Best Accuracy Score: 0.6981930435178354 Best Score: 0.5802458041513825
- Model: model_dt Best Parameters: {'max_depth': 25} Best F1-macro Score: 0.7711546435003871 Best Accuracy Score: 0.8403022520189045 Best Score: 0.7711546435003871
- 전체 출력창Model: model_rf Best Parameters: {'max_depth': 62, 'n_estimators': 300} Best F1-macro Score: 0.6665468163184007 Best Accuracy Score: 0.7860942079402135 Best Score: 0.6665468163184007Model: model_xgb Best Parameters: {'max_depth': 64, 'n_estimators': 50} Best F1-macro Score: 0.8096816271500844 Best Accuracy Score: 0.8631893746786329 Best Score: 0.8096816271500844
- Model: model_et Best Parameters: {'max_depth': 65, 'n_estimators': 930} Best F1-macro Score: 0.5802458041513825 Best Accuracy Score: 0.6981930435178354 Best Score: 0.5802458041513825
- model_dt하는 중 2024-02-15 23:32:35 Score: 0.7734968967818564 2024-02-15 23:32:35 Score: 0.7768765067610435 2024-02-15 23:32:36 Score: 0.7643739447026154 2024-02-15 23:32:37 Score: 0.7623637109588748 2024-02-15 23:32:38 Score: 0.7786621582975456 2024-02-15 23:32:38 Score: 0.7723861550433266 2024-02-15 23:32:39 Score: 0.7745798475864126 2024-02-15 23:32:40 Score: 0.7650308516625927 2024-02-15 23:32:40 Score: 0.7613123275677983 2024-02-15 23:32:41 Score: 0.7782796992960147 model_rf하는 중 2024-02-15 23:32:52 Score: 0.6467345734811936 2024-02-15 23:33:01 Score: 0.6599997129665837 2024-02-15 23:33:11 Score: 0.6507932368458079 2024-02-15 23:33:20 Score: 0.6496164535310948 2024-02-15 23:33:30 Score: 0.6645225072697792 2024-02-15 23:34:28 Score: 0.661798558372096 2024-02-15 23:35:26 Score: 0.6712752923764389 2024-02-15 23:36:27 Score: 0.6699383851808802 2024-02-15 23:37:25 Score: 0.6551229211458003 2024-02-15 23:38:24 Score: 0.6722412825973408 2024-02-15 23:38:33 Score: 0.6484702651116608 2024-02-15 23:38:43 Score: 0.6636754451083045 2024-02-15 23:38:53 Score: 0.6503778119933712 2024-02-15 23:39:03 Score: 0.6478283586240391 2024-02-15 23:39:13 Score: 0.6675192166500906 2024-02-15 23:40:11 Score: 0.6639971611364844 2024-02-15 23:41:13 Score: 0.6720950887499757 2024-02-15 23:42:12 Score: 0.6697667631251256 2024-02-15 23:43:09 Score: 0.6534160629299078 2024-02-15 23:44:09 Score: 0.6734590056505099 model_et하는 중 2024-02-15 23:45:31 Score: 0.5591487896477373 2024-02-15 23:45:37 Score: 0.5789781954176758 2024-02-15 23:45:43 Score: 0.5669878869071248 2024-02-15 23:45:49 Score: 0.5594979153060298 2024-02-15 23:45:55 Score: 0.5590177882858921 2024-02-15 23:47:44 Score: 0.5796289916518254 2024-02-15 23:49:32 Score: 0.583740127389552 2024-02-15 23:51:18 Score: 0.585692593408991 2024-02-15 23:53:03 Score: 0.5788789888329663 2024-02-15 23:54:47 Score: 0.5732883194735778 model_xgb하는 중 2024-02-15 23:56:55 Score: 0.7974785779644502 2024-02-15 23:57:04 Score: 0.8154100137082023 2024-02-15 23:57:12 Score: 0.807892538981471 2024-02-15 23:57:20 Score: 0.8162352979054269 2024-02-15 23:57:28 Score: 0.811391707190872 2024-02-15 23:57:53 Score: 0.7932996077873283 2024-02-15 23:58:19 Score: 0.8084430584565435 2024-02-15 23:58:44 Score: 0.8046897289356689 2024-02-15 23:59:10 Score: 0.8010032771667751 2024-02-15 23:59:35 Score: 0.8090104107481139 Model: model_dt Best Parameters: {'max_depth': 25} Best F1-macro Score: 0.7711546435003871 Best Accuracy Score: 0.8403022520189045 Best Score: 0.7711546435003871
- xgb만 2로 압축
- xgb만 점수 의미있게 오른 것 같아서 이것만 제출
- submission_2024-02-16_002953
'스파르타 > TIL(Today I Learned)' 카테고리의 다른 글
2024-02-19 (0) 2024.02.19 2024-02-16 (0) 2024.02.17 2024-02-14 (0) 2024.02.15 2024-02-13 (0) 2024.02.14 2024-02-09~2024-02-12(설날연휴) (0) 2024.02.14