#4 Deicison Tree Model (의사결정나무)

Deicison Tree Model은 예측력은 다른 기법들에 비해 떨어지지만, 해석이 수월하고 투명하다는 장점이 있다.

과도하게 성장한 의사 결정 트리는 훈련 데이터셋에서만 관찰되는 너무 구체적인 오류를 포착할 수 있으며,
이는 일반적인 경우에서는 관찰되지 않을 수 있습니다.
훈련 데이터에서 매우 잘 예측한다고 해도 실제 세계 문제에서 잘 작동한다는 보장은 항상 없습니다.

# 즉, Overfitting의 가능성이 높다.

Overfitting 문제를 피하기 위해 어떤 식으로든 복잡성을 제한해야 한다.

두 가지 접근 방식

Pre-pruning (사전 가지치기)
모델을 성장시키기 전에 복잡성을 제한
Post-pruning (사후 가지치기)
의사 결정 트리 모델이 형성된 후에 너무 복잡한 가지를 제거

Pre-pruning

Post-pruning

코드예시 (R)

load(url('https://github.com/hbchoi/SampleData/blob/master/dtree_data.RData?raw=true'))
str(loans)
library(rpart)
loan_model <- rpart(outcome ~ loan_amount + credit_score, data = loans, method = "class", 
                    control = rpart.control(cp = 0))

# Load the rpart.plot package
library(rpart.plot)
# Plot the loan_model with default settings
rpart.plot(loan_model)
# Plot the loan_model with customized settings
rpart.plot(loan_model, type = 3, box.palette = c("red", "green"), fallen.leaves = TRUE)


loan_model <- rpart(outcome ~ ., data = loans_train, method = "class", control = rpart.control(cp = 0))
# Make predictions on the training dataset
loans_train$pred <- predict(loan_model, loans_train, type = 'class')
# Examine the confusion matrix
table(loans_train$outcome, loans_train$pred)

# Compute the accuracy on the training dataset
mean(loans_train$outcome == loans_train$pred)

# Make predictions on the test dataset
loans_test$pred <- predict(loan_model, loans_test, type = 'class')

# Compute the accuracy on the test dataset
mean(loans_test$outcome == loans_test$pred)

loans_train <- loans_train[-15]
loans_test <- loans_test[-15]

## Pre-pruning ##

# Grow a tree with maxdepth of 6
loan_model <- rpart(outcome ~ ., data = loans_train, method = "class", 
                    control = rpart.control(cp = 0, maxdepth = 6))

# Compute the accuracy of the simpler tree
loans_test$pred <- predict(loan_model, loans_test, type = 'class')
mean(loans_test$outcome == loans_test$pred)

# Grow a tree with minsplit of 500
loan_model2 <- rpart(outcome ~ ., data = loans_train, method =
                       "class", control = rpart.control(cp = 0, minsplit = 500))

# Compute the accuracy of the simpler tree
loans_test$pred2 <- predict(loan_model2, loans_test, type = 'class')
mean(loans_test$outcome == loans_test$pred2)

## Post- pruning ##

# Grow an overly complex tree
loan_model <- rpart(outcome ~ ., data = loans_train, method = "class", control =
                      rpart.control(cp = 0))

# Examine the complexity plot
plotcp(loan_model)

# Prune the tree
loan_model_pruned <- prune(loan_model, cp = 0.0014)

# Compute the accuracy of the pruned tree
loans_test$pred <- predict(loan_model_pruned, loans_test, type = 'class')
mean(loans_test$outcome == loans_test$pred)

'데이터 공부' 카테고리의 다른 글

#6 나이브 베이즈(Naive Bayes)와 베이즈 정리(Bayes Theorem) (0)	2024.06.10
#5 (KNN) k-Nearest Neighbors (1)	2024.06.10
#3 RMSE와 R^2에 대해 (1)	2024.06.09
#2 AUC, ROC에 대해 (0)	2024.06.09
#1 Precision과 Recall, F1 Scored에 대해 (2)	2024.06.09

예준이의 하루공부

#4 Deicison Tree Model (의사결정나무)

두 가지 접근 방식

Pre-pruning

Post-pruning

코드예시 (R)

'데이터 공부' 카테고리의 다른 글

티스토리툴바

#4 Deicison Tree Model (의사결정나무)

두 가지 접근 방식

Pre-pruning

Post-pruning

코드예시 (R)

'데이터 공부' 카테고리의 다른 글

관련글

티스토리툴바