'data' 카테고리의 글 목록

Notice

Recent Posts

Recent Comments

Link

« 2026/03 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Tags more

Archives

Today

Total

관리 메뉴

목록data (4)

빈틈

Machine Leanring

1. 랜덤포레스트(분류) from sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier(random_state = 23) rf.fit(X_train, y_train) 2.

data 2023. 5. 17. 14:25

통계 라이브러리

1. 일평균 t-검정 from scipy.stats import ttest_1samp s, p = ttest_1samp(data, 모평균) - s : 검정통계량 - p : p_value 2. 이항분포 from scipy.stats import binom * 하나의 확률변수 메서드 binom.pmf(k-1, n, p) * 누적확률변수 메서드 binom.cdf(k-1, n, p) - k : 전체시행가운데 성공의 횟수이며(구하는 확률값) - n : 전체 시행횟수 - p : 독립시행의 성공확률 3. 카이제곱 검정 + 카이제곱 독립성 검정 s = 검정통계량 p = p_value from scipy.stats import chi2_contingency crosstab = pd.crosstab(df['A'], df[..

data 2023. 5. 17. 14:11

EDA

1. 사분위수 도출 및 이상치 검출 # 사분위수 도출 및 이상치 검출 Q1 = df["컬럼명"].quantile(.25) Q2 = df["컬럼명"].quantile(.5) Q3 = df["컬럼명"].quantile(.75) IQR = Q3 - Q1 condition1 = df["컬럼명"] > (Q3 + IQR * 1.5) upperOuter = df[condition] condition2 = df["컬럼명"] < (Q1 - IQR * 1.5) lowerOuter = df[condition] print(lowerOuter) print(upperOuter) 2. 중복값 확인 / 제거 중복값 확인 : df.duplicated() 중복값 제거 : df.drop_duplicates('컬럼명') - 옵션 : kee..

data 2023. 5. 14. 16:34

Numpy & Pandas

1. 한글이 포함된 데이터 로드 시 인코딩 방법 path = "https://raw.githubusercontent.com/pandas/main/sample.csv" df = pd.read_csv(path, encoding='euc-kr') df.head() 2. 데이터 타입에 따른 컬럼 출력 : select_dtype include 옵션 : number는 수치형 변수(float, int 등), object는 범주형 변수 exclude 옵션 : (exclude='object')로 설정하면 object를 제외한 나머지 타입의 변수 컬럼들이 모두 출력 # 수치형 변수 컬럼 출력 df2.select_dtypes(include='number').columns # 범주형 변수 컬럼 출력 df2.select_dty..

data 2023. 5. 14. 16:23

이전 Prev 1 Next 다음

목록data (4)

빈틈

티스토리툴바