본문 바로가기

기본소양/CODE15

3. Applied Predictive Modeling [2] Importance Importance Applied Predictive Modeling 1. Feature Importance zipp = [] for zipper in zip(X_train.columns, pipe.named_steps['decisiontreeregressor'].feature_importances_): zipp.append(zipper) zipp = pd.DataFrame(zipp,columns=['feature','importance']).sort_values('importance',ascending=False) plt.figure(figsize=(15, 15)) sns.barplot(y = zipp.feature, x= zipp.importance, palette='Blues_r') plt.titl.. 2021. 2. 18.
3. Applied Predictive Modeling [1] Modeling(Boost) Modeling Applied Predictive Modeling 1. XGBoost from xgboost.sklearn import XGBModel from xgboost import XGBRegressor pipe = make_pipeline( OrdinalEncoder(), XGBRegressor() ) pipe.fit(X_train, y_train) print('훈련 R^2: ', pipe.score(X_train, y_train)) print('TEST R^2: ', pipe.score(X_test, y_test)) print('\n훈련 MAE: ', mean_absolute_error(pipe.predict(X_train), y_train)) print('TEST MAE: ', mean_ab.. 2021. 2. 18.
3. Applied Predictive Modeling [0] Preparing Preparing Applied Predictive Modeling 1. Package !pip install category_encoders !pip install PublicDataReader !pip install PublicDataReader --upgrade !pip install finance-datareader !sudo apt-get install -y fonts-nanum !sudo fc-cache -fv !rm ~/.cache/matplotlib -rf import seaborn as sns import matplotlib.pyplot as plt import matplotlib as mpl import FinanceDataReader as fdr import PublicDataRead.. 2021. 2. 18.
2. Tree based model CODE [4] Hyperparameter Tuning / Threshold Hyperparameter Tuning / Threshold Tree based model CODE 1. RandomizedSearchCV # RandomizedSearchCV from sklearn.model_selection import RandomizedSearchCV Model_xx_rcv = make_pipeline(SimpleImputer(), RandomForestClassifier(criterion='entropy', n_jobs=-1, random_state=1000, oob_score=True, class_weight="balanced") ) dists = { "randomforestclassifier__min_samples_leaf": [None, 9, 10, 11], "randomf.. 2021. 2. 9.
2. Tree based model CODE [3] Model Selection Model Selection Tree based model CODE 1. LogisticsCV from sklearn.linear_model import LogisticRegressionCV from sklearn.preprocessing import StandardScaler lr = LogisticRegressionCV() lr.fit(X_train_simp,y_train_simp) print('훈련 정확도 : ',lr.score(X_train_simp, y_train_simp)) print('검증 정확도 : ',lr.score(X_val_simp, y_val_simp)) print('훈련 f1 score : ',f1_score(y_train_simp, lr.predict(X_train_simp))).. 2021. 2. 9.
2. Tree based model CODE [2] Tree Model Tree Model Tree based model CODE 0. Reference (Baseline) from sklearn.metrics import accuracy_score from sklearn.metrics import f1_scoremajor = y_train.mode()[0] y_train_pred = [major] * len(y_train) major = y_val.mode()[0] y_val_pred = [major] * len(y_val) print("training accuracy: ", accuracy_score(y_train, y_train_pred)) print("validation accuracy: ", accuracy_score(y_val, y_val_pred)) print(.. 2021. 2. 9.
2. Tree based model CODE [1] Encode, Impute Encode, Impute Tree based model CODE 1. Hash Encoder from category_encoders import HashingEncoder enc_has = HashingEncoder(n_components=5) enc_has.fit_transform(train['state']) # 차원감소라고 생각하면 된다. 51개 범주 >> 5개 범주 2. Count Encoder from category_encoders import CountEncoder encoder_count = CountEncoder() train_count = encoder_count.fit_transform(train_binary.dropna().astype(object)) # 총 갯수로 인코딩을 해준다.. 2021. 2. 9.
2. Tree based model CODE [0] 시작은 언제나 EDA 시작은 언제나 EDA Tree based model CODE 0. Data Description 항상 먼저 확인 할 것 1. Profiling pip install -U pandas-profiling from pandas_profiling import ProfileReport df.profile_report() 2. Duplicated train.T.duplicated().any() 3. Missing Value (Bar로 나타내기) import matplotlib.pyplot as plt import seaborn as sns missing = train.isnull().sum() missing = missing[missing>0] miss = pd.DataFrame(missing, columns=['.. 2021. 2. 9.
1. Linear Regression CODE [3] How to select Variables How to select Variables Linear Regression CODE 0. EDA 항상 EDA와 도메인지식을 통해서 이상치와 결측을 제거하고 Feature Engineering을 통해 특성을 잘 조정해야한다. 모델의 성능을 높이는 데 가장 중요한 것은 다른데 있지 않다. 1. KBest ## K Best from sklearn.feature_selection import SelectKBest, f_regression selector = SelectKBest(score_func=f_regression, k=20) # Instance 생성 X_train_K = selector.fit_transform(X_train, y_train) # train 적용 X_test_K = selector.tra.. 2021. 2. 3.
1. Linear Regression CODE [2] Modeling Modeling with Multivariate , Polynomial, Ridge, Lasso Linear Regression CODE 0. Reference (Baseline) ## Simple Regression Reference Model (Mean or Median) # Visualization to flind Baseline plt.figure(figsize = (15,5)) plt.hist(df.target, bins=100,color='blue',alpha=0.5) plt.axvline(df.target.mean(),color ='red') plt.axvline(df.target.median(),color ='navy') plt.xlabel('target') plt.title('Histog.. 2021. 2. 3.
1. Linear Regression CODE [1] Simple Regression Simple Regression Linear Regression CODE 0. Reference (Baseline) ## Simple Regression Reference Model (Mean or Median) # Visualization to flind Baseline plt.figure(figsize = (15,5)) plt.hist(df.target, bins=100,color='blue',alpha=0.5) plt.axvline(df.target.mean(),color ='red') plt.axvline(df.target.median(),color ='navy') plt.xlabel('target') plt.title('Histogram of Price') plt.grid() plt.show().. 2021. 2. 2.
1. Linear Regression CODE [0] 시작은 언제나 EDA 시작은 언제나 EDA Linear Regression CODE 0. Data Description 항상 먼저 확인 할 것 1. Profiling pip install -U pandas-profiling from pandas_profiling import ProfileReport df.profile_report() 2. EDA ## 상관계수 df_cor = df.corr().copy() print(df_cor.sort_values('target',ascending=False).price.head(5)) ## Only Heatmap import seaborn as sns import matplotlib.pyplot as plt df_cor = df.corr().copy() fig, ax = plt.subpl.. 2021. 2. 2.