기본소양49 3. Applied Predictive Modeling [2] Importance Importance Applied Predictive Modeling 1. Feature Importance zipp = [] for zipper in zip(X_train.columns, pipe.named_steps['decisiontreeregressor'].feature_importances_): zipp.append(zipper) zipp = pd.DataFrame(zipp,columns=['feature','importance']).sort_values('importance',ascending=False) plt.figure(figsize=(15, 15)) sns.barplot(y = zipp.feature, x= zipp.importance, palette='Blues_r') plt.titl.. 2021. 2. 18. 3. Applied Predictive Modeling [1] Modeling(Boost) Modeling Applied Predictive Modeling 1. XGBoost from xgboost.sklearn import XGBModel from xgboost import XGBRegressor pipe = make_pipeline( OrdinalEncoder(), XGBRegressor() ) pipe.fit(X_train, y_train) print('훈련 R^2: ', pipe.score(X_train, y_train)) print('TEST R^2: ', pipe.score(X_test, y_test)) print('\n훈련 MAE: ', mean_absolute_error(pipe.predict(X_train), y_train)) print('TEST MAE: ', mean_ab.. 2021. 2. 18. 3. Applied Predictive Modeling [0] Preparing Preparing Applied Predictive Modeling 1. Package !pip install category_encoders !pip install PublicDataReader !pip install PublicDataReader --upgrade !pip install finance-datareader !sudo apt-get install -y fonts-nanum !sudo fc-cache -fv !rm ~/.cache/matplotlib -rf import seaborn as sns import matplotlib.pyplot as plt import matplotlib as mpl import FinanceDataReader as fdr import PublicDataRead.. 2021. 2. 18. API Kakao Developers 카카오 API를 활용하여 다양한 어플리케이션을 개발해보세요. 카카오 로그인, 메시지 보내기, 친구 API, 인공지능 API 등을 제공합니다. developers.kakao.com Ainize | Launchpad for open-source AI projects Instantly run or deploy any open source projects for free. ainize.ai 공공 인공지능 오픈 API·DATA 서비스 포털 과학기술정보통신부의 R&D 과제를 통해 개발한 다양한 인공지능 기술 및 데이터를 누구나 사용할 수 있도록 제공 aiopen.etri.re.kr NAVER CLOUD PLATFORM API - API 참조서 개요 네이버 클라우드 플랫폼에서 제공하는 .. 2021. 2. 12. 2. Tree based model CODE [4] Hyperparameter Tuning / Threshold Hyperparameter Tuning / Threshold Tree based model CODE 1. RandomizedSearchCV # RandomizedSearchCV from sklearn.model_selection import RandomizedSearchCV Model_xx_rcv = make_pipeline(SimpleImputer(), RandomForestClassifier(criterion='entropy', n_jobs=-1, random_state=1000, oob_score=True, class_weight="balanced") ) dists = { "randomforestclassifier__min_samples_leaf": [None, 9, 10, 11], "randomf.. 2021. 2. 9. 2. Tree based model CODE [3] Model Selection Model Selection Tree based model CODE 1. LogisticsCV from sklearn.linear_model import LogisticRegressionCV from sklearn.preprocessing import StandardScaler lr = LogisticRegressionCV() lr.fit(X_train_simp,y_train_simp) print('훈련 정확도 : ',lr.score(X_train_simp, y_train_simp)) print('검증 정확도 : ',lr.score(X_val_simp, y_val_simp)) print('훈련 f1 score : ',f1_score(y_train_simp, lr.predict(X_train_simp))).. 2021. 2. 9. 2. Tree based model CODE [2] Tree Model Tree Model Tree based model CODE 0. Reference (Baseline) from sklearn.metrics import accuracy_score from sklearn.metrics import f1_scoremajor = y_train.mode()[0] y_train_pred = [major] * len(y_train) major = y_val.mode()[0] y_val_pred = [major] * len(y_val) print("training accuracy: ", accuracy_score(y_train, y_train_pred)) print("validation accuracy: ", accuracy_score(y_val, y_val_pred)) print(.. 2021. 2. 9. 2. Tree based model CODE [1] Encode, Impute Encode, Impute Tree based model CODE 1. Hash Encoder from category_encoders import HashingEncoder enc_has = HashingEncoder(n_components=5) enc_has.fit_transform(train['state']) # 차원감소라고 생각하면 된다. 51개 범주 >> 5개 범주 2. Count Encoder from category_encoders import CountEncoder encoder_count = CountEncoder() train_count = encoder_count.fit_transform(train_binary.dropna().astype(object)) # 총 갯수로 인코딩을 해준다.. 2021. 2. 9. 2. Tree based model CODE [0] 시작은 언제나 EDA 시작은 언제나 EDA Tree based model CODE 0. Data Description 항상 먼저 확인 할 것 1. Profiling pip install -U pandas-profiling from pandas_profiling import ProfileReport df.profile_report() 2. Duplicated train.T.duplicated().any() 3. Missing Value (Bar로 나타내기) import matplotlib.pyplot as plt import seaborn as sns missing = train.isnull().sum() missing = missing[missing>0] miss = pd.DataFrame(missing, columns=['.. 2021. 2. 9. 1. Linear Regression CODE [3] How to select Variables How to select Variables Linear Regression CODE 0. EDA 항상 EDA와 도메인지식을 통해서 이상치와 결측을 제거하고 Feature Engineering을 통해 특성을 잘 조정해야한다. 모델의 성능을 높이는 데 가장 중요한 것은 다른데 있지 않다. 1. KBest ## K Best from sklearn.feature_selection import SelectKBest, f_regression selector = SelectKBest(score_func=f_regression, k=20) # Instance 생성 X_train_K = selector.fit_transform(X_train, y_train) # train 적용 X_test_K = selector.tra.. 2021. 2. 3. 1. Linear Regression CODE [2] Modeling Modeling with Multivariate , Polynomial, Ridge, Lasso Linear Regression CODE 0. Reference (Baseline) ## Simple Regression Reference Model (Mean or Median) # Visualization to flind Baseline plt.figure(figsize = (15,5)) plt.hist(df.target, bins=100,color='blue',alpha=0.5) plt.axvline(df.target.mean(),color ='red') plt.axvline(df.target.median(),color ='navy') plt.xlabel('target') plt.title('Histog.. 2021. 2. 3. 1. Linear Regression CODE [1] Simple Regression Simple Regression Linear Regression CODE 0. Reference (Baseline) ## Simple Regression Reference Model (Mean or Median) # Visualization to flind Baseline plt.figure(figsize = (15,5)) plt.hist(df.target, bins=100,color='blue',alpha=0.5) plt.axvline(df.target.mean(),color ='red') plt.axvline(df.target.median(),color ='navy') plt.xlabel('target') plt.title('Histogram of Price') plt.grid() plt.show().. 2021. 2. 2. 이전 1 2 3 4 5 다음