반응형
시작은 언제나 EDA
Linear Regression CODE
0. Data Description
항상 먼저 확인 할 것
1. Profiling
pip install -U pandas-profiling
from pandas_profiling import ProfileReport
df.profile_report()
2. EDA
## 상관계수
df_cor = df.corr().copy()
print(df_cor.sort_values('target',ascending=False).price.head(5))
## Only Heatmap
import seaborn as sns
import matplotlib.pyplot as plt
df_cor = df.corr().copy()
fig, ax = plt.subplots(figsize=(16, 12))
plt.title('Pearson Correlation of features')
sns.heatmap(df_cor,cmap='gist_earth',linewidths=0.25, linecolor='k', annot=True)
plt.show()
## Only Bar (Seaborn)
plt.figure(figsize=(8,4))
sns.barplot(df_cor.sort_values('target',ascending=False).target,df_cor.sort_values('target',ascending=False).target.index,orient='h')
plt.title('Pearson Correlation(barh)')
plt.show()
## Heatmap& Bar(Mat)
import seaborn as sns
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1,2,figsize=(10,5))
plt.subplots_adjust(wspace=0.5)
sns.heatmap(df_cor,ax=axes[0], cmap='Blues')
axes[0].set_title('Pearson Correlation(Heatmap)')
axes[1].barh(df_cor.sort_values('target',ascending=True).target.index,df_cor.sort_values('target',ascending=True).price)
axes[1].set_title('Pearson Correlation(barh)')
plt.show()
## Scatter로 Outlier확인
plt.figure(figsize=(5,5))
sns.scatterplot(df.variable,df.target,color= 'red',alpha=0.5)
plt.grid()
plt.show()
# Pairplot
plt.figure(figsize=(5,5))
sns.pairplot(df)
plt.show()
#Countplot
sns.countplot(x='age_5', hue='cardio', data = df_1, palette="Set2")
2. OneHotEncoding
! pip install category_encoders
from category_encoders import OneHotEncoder
encoder = OneHotEncoder(use_cat_names = True) #use_cat_names : 카테고리 이름 살릴지
df_OneHot = encoder.fit_transform(df) # fit & transform
print(df.shape)
print(df_OneHot.shape)
3. train_test_split
from sklearn.model_selection import train_test_split
X = df_OneHot.drop(columns='Price').copy()
y = df_OneHot.Price
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size = 0.2 , random_state=1)
print(df_OneHot.shape, X_train.shape, X_test.shape)
반응형
'기본소양 > CODE' 카테고리의 다른 글
1. Linear Regression CODE [2] Modeling (0) | 2021.02.03 |
---|---|
1. Linear Regression CODE [1] Simple Regression (0) | 2021.02.02 |
3. Linear Algebra[4] CODE (0) | 2021.01.18 |
2. Statistics [4] CODE (0) | 2021.01.10 |
1. Data Preprocess & EDA [4] CODE (0) | 2021.01.02 |
댓글