[ASHRAE - Great Energy Predictor III] Best Weight 찾기

이번 포스팅에서는 Cross Validation 결과 Merge 할때 Weight를 어떻게 하면 좋을지 고민한 내용입니다.

[Cross Validation, 최적의 Weight 찾기 : 1.1 -> 1.09 ]

- 주어진 데이터를 Building_id 별로 Meter_reading 값을 산출하여 보면 최근 실적기준으로 추이가 변한 Building을

찾을 수 있었습니다. 빌딩의 옵션이 변했거나 주변 환경이 변한 이유일 것이라고 추정이 되는데, 이를 모델에 반영

하면 더 성능이 좋아질 것이라는 기대에 CV 결과를 단순 평균이 아닌 가중 평균을 진행했습니다.

데이터를 3 Folds로 분류하고 학습한 뒤 Inference 단계에서 모두 같은 비율로 평균을 내어 제출 했을 때 LB 1.1을

얻었고 최적의 비율을 찾기 위해 아래와 같이 테스트를 진행한 후 0.7 :0.2:0.1일 때 가장 성능이 좋다는 것을 확인하

고 제출 했을 때 LB 1.09 성적을 거뒀습니다.

- 포스팅을 하다보니 Cross Validation 전략 잡을 때 building id로 folds를 나누는 지금의 방식이 아니라

빌딩Id별 Time에 따라 Folds를 나눠서 학습시키면 더욱 학습이 잘 되지 않았을까 하는 생각이드네요.

N = 10
scores = []# np.zeros(N,)

for i in range(1,10):
    res4 = []
    p = 1* 1./i
    print(p)
    for k in range(len(res3)):
        res4.append(((1-p)*res3[k][0]+2/3*(p)*res3[k][1]+1/3*(p)*res3[k][2]))
    res4 = np.concatenate(res4)
    sample_submission = pd.read_csv('sample_submission.csv')
    sample_submission["meter_reading"] = res4
    sample_submission.loc[sample_submission['meter_reading'] < 0, 'meter_reading'] = 0

    sample_submission = pd.merge(sample_submission,leak_df.loc[:,['row_id','leak_meter_reading']],how ='inner')    
    score = np.sqrt(mean_squared_log_error( sample_submission['leak_meter_reading'], sample_submission['meter_reading'] ))
    
    
    scores.append(score)     
    
plt.plot(scores)

'Kaggle 대회' 카테고리의 다른 글

[2019 3rd ML month with KaKR] 대회 참가 후기(삽질의 기록) (0)	2019.12.19
[2019 3rd ML month with KaKR] 대회 소개 (1)	2019.12.18
[ASHRAE - Great Energy Predictor III] Hyperparameter Optimaization (0)	2019.12.14
[ASHRAE - Great Energy Predictor III] Leakage 된 데이터 활용하기 (0)	2019.12.09
[ASHRAE - Great Energy Predictor III] Model 확장 (0)	2019.12.08

사자처럼 우아하게

[ASHRAE - Great Energy Predictor III] Best Weight 찾기

'Kaggle 대회' 카테고리의 다른 글

댓글

티스토리툴바

[ASHRAE - Great Energy Predictor III] Best Weight 찾기

'Kaggle 대회' 카테고리의 다른 글

관련글

댓글

티스토리툴바