Stat 546 Homework 3

Instruction
Question 1: [10 pts] Easiest Question of this Homework
Question 2: [25 pts] Generating Mondrian Trees
Question 3: [15 pts] Validate Your Mondrian Tree
Question 4: [15 pts] Mondrian Tree Fitting and Prediction
Question 5: [15 pts] Mondrian Forest
Question 6: [50 pts] Adaboost Classification with Stump Model

Instruction

Students are encouraged to work together on homework. However, sharing, copying, or providing any part of a homework solution or code is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.

You are required to submit the rendered file HWx_yourNetID.pdf. For example, HW01_rqzhu.pdf. Please note that this must be a .pdf file. .html format cannot be accepted since they may not be readable in Gradescope. All proofs must be typed in LaTeX format. Make all of your R code chunks visible for grading.
Include your Name and NetID in the report.
If you use this file or the example homework .Rmd file as a template, be sure to remove this instruction section.
On random seed and reproducibility: You should use R version \(\geq 4.0.0\). This will ensure your random seed generation is the same as everyone else. Please note that updating the R version may require you to reinstall all of your packages. In the markdown file, set seed properly so that the results can be reproduced.
For some questions, there will be restrictions on what packages/functions you can use. Please read the requirements carefully. As long as the question does not specify such restrictions, you can use anything.
Using AI tools: you should provide a brief statement on the involvement of AI tools on your homework.

Question 1: [10 pts] Easiest Question of this Homework

Given a fitted random forest, with each tree as a partition on the space of \(\cal X\), show that the kernel function \(K(x, z)\) induced from the forest is positive definite.

Question 2: [25 pts] Generating Mondrian Trees

For this question, our goal is to write a function that creates a kernel based on the idea of Mondrian tree. This question consists of several parts:

Write a fitting function that essentially creates the Mondrian kernel definition. It does not need to be data adaptive, so we will just use \([0, 1]^p\) as the domain at the root node. The function should have the following arguments:
- \(\lambda\), the input budget parameter that controls the complexity of the partition.
- \(p\), the dimension of the input space.
The function should return the fitted tree structure. It is preferable that you save your tree structure in a matrix organized in the following way:
- The first column is the type of the node, 1 for internal node, -1 for leaf node.
- The second column is the index of the dimension that the node splits on. For leaf nodes, it can be NA.
- The third column is the cut location. For leaf nodes, it can be NA.
- The fourth and the fifth columns are the row numbers of the left and right child nodes.
- The sixth column should be left empty for now, we will use it to store the prediction value for each terminal node.

Here is an example of the tree structure for a tree with 3 internal nodes and 4 leaf nodes. The first row is the root node that splits into node 2 and 3, and each is further split into two leaf nodes.

While fitting this Mondrian tree, you also need to keep track of the bounding box of each node. You can save them into two matrices, one for the lower bound and one for the upper bound. Each matrix has \(m\) rows and \(p\) columns, where \(m\) is the number of nodes in the tree, so that each row corresponds to the same row in the tree structure matrix. The \(j\)-th column of the \(i\)-th row is the lower/upper bound of the \(j\)-th dimension of the bounding box of node \(i\). For example, if the bounding box of node 2 is \([0, 0.3] \times [0, 1]\), then the second row of the lower bound matrix is \((0, 0)\) and the first row of the upper bound matrix is \((0.3, 1)\).
Your code should contain a (while or for) loop that iteratively find and split current nodes and add child nodes to the tree until the budget \(\lambda\) is exhausted. You can use the rexp function to generate exponential random variables for each of the current nodes, and find the one that should be split. After the splitting, update the the budget, the tree structure and the bounding boxes accordingly.

Call the function with \(p = 2\) and \(\lambda = 2\). And print the tree structure (restrict that up to the 5th row) and the bounding boxes.

Here is my output:

The tree structure (first 5 rows):

     node_type split_dim   cut_loc left_child right_child pred_value
[1,]         1         2 0.1303658          2           3         NA
[2,]         1         1 0.2542666          4           5         NA
[3,]         1         1 0.1783499          6           7         NA
[4,]        -1        NA        NA         NA          NA         NA
[5,]        -1        NA        NA         NA          NA         NA

The lower bound matrix (first 5 rows):

##           [,1]      [,2]
## [1,] 0.0000000 0.0000000
## [2,] 0.0000000 0.0000000
## [3,] 0.0000000 0.1303658
## [4,] 0.0000000 0.0000000
## [5,] 0.2542666 0.0000000

The upper bound matrix (first 5 rows):

##           [,1]      [,2]
## [1,] 1.0000000 1.0000000
## [2,] 1.0000000 0.1303658
## [3,] 1.0000000 1.0000000
## [4,] 0.2542666 0.1303658
## [5,] 1.0000000 0.1303658

Answer:

  set.seed(546)
  
  myMondrianTree <- function(lambda, p) {
  
    # Initialize the tree structure matrix
    tree <- matrix(NA, nrow = 1, ncol = 6)
    colnames(tree) <- c("node_type", "split_dim", "cut_loc", "left_child", "right_child", "pred_value")
    tree[1, ] <- c(-1, NA, NA, NA, NA, NA)  # Root node, use -1 for current nodes that being consider for splitting
    
    # Initialize bounding box matrices
    lower_bound <- matrix(0, nrow = 1, ncol = p)
    upper_bound <- matrix(1, nrow = 1, ncol = p)
    
    # Initialize variables
    total_time <- 0
    
    while (total_time < lambda)
    {
      # find all current nodes
      all_nodes = which(tree[, 1] == -1)  # Current nodes that can be split
      
      # calculate the rate for each current node
      rates = sapply(all_nodes, function(node) sum(upper_bound[node, ] - lower_bound[node, ]))
      
      # generate exponential random variables for each current node
      exp_rvs = rexp(length(all_nodes), rate = rates)
      
      # find the node with the smallest exponential random variable
      split_node = all_nodes[which.min(exp_rvs)]
      
      # update the total time
      total_time = total_time + min(exp_rvs)
      
      if (total_time >= lambda) {
        break
      }
      
      # randomly select a dimension to split
      split_dim = sample(1:p, 1, prob = upper_bound[split_node, ] - lower_bound[split_node, ])
      
      # randomly select a cut location within the bounding box of the selected dimension
      cut_loc = runif(1, lower_bound[split_node, split_dim], upper_bound[split_node, split_dim])
      
      # update the tree structure and node types
      tree[split_node, ] <- c(1, split_dim, cut_loc, nrow(tree) + 1, nrow(tree) + 2, NA)  # Update the split node
      tree <- rbind(tree, c(-1, NA, NA, NA, NA, NA), c(-1, NA, NA, NA, NA, NA))  # Add two new current nodes
      
      # update the bounding boxes
      left_upper_bound <- upper_bound[split_node, ]
      left_upper_bound[split_dim] <- cut_loc
      left_lower_bound <- lower_bound[split_node, ]
      
      right_lower_bound <- lower_bound[split_node, ]
      right_lower_bound[split_dim] <- cut_loc
      right_upper_bound <- upper_bound[split_node, ]
      
      lower_bound <- rbind(lower_bound, left_lower_bound, right_lower_bound)
      upper_bound <- rbind(upper_bound, left_upper_bound, right_upper_bound)
      
      # print the current tree structure for debugging
      # cat("\n\n --------- Split ----")
      # print(tree)
      # print(lower_bound)
      # print(upper_bound)

    }
    rownames(lower_bound) <- NULL
    rownames(upper_bound) <- NULL
    
    
    return(list("tree" = tree, 
                "lower_bound" = lower_bound, 
                "upper_bound" = upper_bound))
  }
  
  newfit = myMondrianTree(lambda = 2, p = 2)
  printrows = min(5, nrow(newfit$tree))
  newfit$tree[1:printrows, ]

##      node_type split_dim   cut_loc left_child right_child pred_value
## [1,]         1         2 0.1303658          2           3         NA
## [2,]         1         1 0.2542666          4           5         NA
## [3,]         1         1 0.1783499          6           7         NA
## [4,]        -1        NA        NA         NA          NA         NA
## [5,]        -1        NA        NA         NA          NA         NA

  newfit$lower_bound[1:printrows, ]

##           [,1]      [,2]
## [1,] 0.0000000 0.0000000
## [2,] 0.0000000 0.0000000
## [3,] 0.0000000 0.1303658
## [4,] 0.0000000 0.0000000
## [5,] 0.2542666 0.0000000

  newfit$upper_bound[1:printrows, ]

##           [,1]      [,2]
## [1,] 1.0000000 1.0000000
## [2,] 1.0000000 0.1303658
## [3,] 1.0000000 1.0000000
## [4,] 0.2542666 0.1303658
## [5,] 1.0000000 0.1303658

Question 3: [15 pts] Validate Your Mondrian Tree

After constructing the Mondrian tree, you should be able to visualize the partition on \([0, 1]^2\) for \(p = 2\). To do this, use the rect() function to plot the terminal node bounding boxes of all leaf nodes. You can consider assigning a color to each leaf node. Here is what I have.

In addition, generate 300 observations uniformly on \([0, 1]^2\). And write a prediction function, that finds the terminal node for each observation. You can generate 300 observations uniformly on \([0, 1]^2\). Then, for each observation, find the leaf node it belongs to. Assign a color to each leaf node, and plot the observations with the color of the leaf node they belong to. Here is what I have. After this question, you can discard the bounding box matrices, since they are not needed anymore.

Answer:

  set.seed(546)
  newfit = myMondrianTree(lambda = 4, p = 2)
  tree = newfit$tree
  
  # plot all terminal nodes
  # initiate empty plot
  plot(NULL, xlim = c(0, 1), ylim = c(0, 1), 
       xlab = "X1", ylab = "X2", 
       main = "Mondrian Tree Partition")
  
  nodecolor = matrix(NA, nrow(tree))
  
  for (i in 1:nrow(tree))
  {
    if (tree[i, 1] == 1) next  # Skip non-leaf nodes
    
    # Get the bounding box for the leaf node
    lower <- newfit$lower_bound[i, ]
    upper <- newfit$upper_bound[i, ]
    
    # Draw the rectangle for the bounding box
    usecolor = rgb(runif(1), runif(1), runif(1))
    nodecolor[i] = usecolor
    
    rect(lower[1], lower[2], upper[1], upper[2], 
         col = usecolor, border = "black")
  }

  # Generate 300 random data in [0, 1]^2
  x <- matrix(runif(300 * 2), ncol = 2)
  colnames(x) <- c("X1", "X2")
  
  # Function to find the leaf node for a given x
  findLeafNode <- function(x, tree) {
    
    node <- 1  # Start from the root node
    
    while (tree[node, 1] != -1) {  # While not a leaf node
      split_dim <- tree[node, 2]
      cut_loc <- tree[node, 3]
      if (x[split_dim] <= cut_loc) {
        node <- tree[node, 4]  # Go to left child
      } else {
        node <- tree[node, 5]  # Go to right child
      }
    }
    
    return(node)
  }
  
  # Find the leaf node for each point
  leaf_nodes <- sapply(1:nrow(x), function(i) findLeafNode(x[i, ], tree))
  
  # initiate empty plot
  plot(NULL, xlim = c(0, 1), ylim = c(0, 1), 
       xlab = "X1", ylab = "X2", 
       main = "Mondrian Tree Partition")
  
  # plot frame
  for (i in 1:nrow(tree))
  {
    if (tree[i, 1] == 1) next  # Skip non-leaf nodes
    
    # Get the bounding box for the leaf node
    lower <- newfit$lower_bound[i, ]
    upper <- newfit$upper_bound[i, ]
    
    # Draw the rectangle for the bounding box
    rect(lower[1], lower[2], upper[1], upper[2], 
         border = "grey")
  }
  
  # Plot the points
  for (i in 1:nrow(x))
  {
    usecol = nodecolor[leaf_nodes[i]]
    points(x[i, 1], x[i, 2], col = usecol, pch = 19)
  }

Question 4: [15 pts] Mondrian Tree Fitting and Prediction

Now you have your tree model, lets use it for a regression problem.

Write a training function that takes your constructed Mondrian tree, and set of training data. Find which terminal node the training data belongs to, and calculate the average of the training responses in each leaf node. Store the average response in the sixth column of the tree structure matrix.
Write a prediction function that takes a set of testing data and the fitted tree. Output the predictions for the testing data.

Use the following data generating process to generate your training and testing data.

\[ Y = \sin(2 \pi X_1) + 2 x_2^2 + \epsilon, \quad \epsilon \sim N(0, 1) \]

Where \(X\) is uniform on \([0, 1]^p\). The functional surface look like the following:

  set.seed(546)
  
  # Generate a grid of points for plotting the true function
  grid_size <- 50
  x1_seq <- seq(0, 1, length.out = grid_size)
  x2_seq <- seq(0, 1, length.out = grid_size)
  grid_points <- expand.grid(X1 = x1_seq, X2 = x2_seq)
  
  # True function
  true_function <- function(x) {
    sin(2 * pi * x[1]) + 2 * x[2]^2
  }
  
  # Compute the true function values on the grid
  true_values <- apply(grid_points, 1, true_function)
  
  # Reshape for contour plot
  true_matrix <- matrix(true_values, nrow = grid_size, ncol = grid_size)
  
  # Plot the true function surface
  contour(x1_seq, x2_seq, true_matrix, 
          main = "True Function Surface", 
          xlab = "X1", ylab = "X2", col = terrain.colors(10))

Generate 400 training data and 1000 testing data. Fit a Mondrian tree with \(\lambda = 4\) on the training data, and make predictions on the testing data. You may try to experiment with different values of \(\lambda\). Report the Mean Squared Error (MSE) on the testing data.

This is my fitted surface by the Mondrian tree with \(\lambda = 10\):

Answer:

  set.seed(546)
  
  # Generate training data
  ntrain <- 400
  ntest <- 1000
  X <- matrix(runif((ntrain + ntest) * 2), ncol = 2)
  colnames(X) <- c("X1", "X2")
  Y = sin(2 * pi * X[, 1]) + 2 * X[, 2]^2 + rnorm(ntrain + ntest)
  train_X <- X[1:ntrain, ]
  train_Y <- Y[1:ntrain]
  
  test_X <- X[(ntrain + 1):(ntrain + ntest), ]
  test_Y <- Y[(ntrain + 1):(ntrain + ntest)]
  
  # Fit Mondrian tree
  lambda <- 10
  mondrian_construct <- myMondrianTree(lambda, p = 2)
  tree <- mondrian_construct$tree
  
  # a function to fit the tree with training data
  mondrian_fit <- function(tree, train_X, train_Y) {
    
    # Find leaf nodes for training data
    leaf_nodes <- sapply(1:nrow(train_X), function(i) findLeafNode(train_X[i, ], tree))
    
    # Calculate mean response for each leaf node
    leaf_means <- tapply(train_Y, leaf_nodes, mean)
    
    # Update the tree structure with mean responses
    for (node in names(leaf_means)) {
      tree[as.numeric(node), 6] <- leaf_means[node]
    }
    
    return(tree)
  }
  
  # Function to make predictions
  mondrian_pred <- function(tree, test_X) {
    
    predictions <- sapply(1:nrow(test_X), function(i) {
      leaf_node <- findLeafNode(test_X[i, ], tree)
      return(tree[leaf_node, 6])
    })
    
    return(predictions)
  }
  
  # train the model 
  tree_fit <- mondrian_fit(tree, train_X, train_Y)
  test_pred <- mondrian_pred(tree_fit, test_X)
  
  # Calculate Mean Squared Error
  mse <- mean((test_pred - test_Y)^2, na.rm = TRUE)
  
  print(paste("Mean Squared Error on Test Data:", round(mse, 4)))

## [1] "Mean Squared Error on Test Data: 1.2669"

    # plot the fitted surface of the testing data
  grid_size = 30
  x1_seq <- seq(0, 1, length.out = grid_size)
  x2_seq <- seq(0, 1, length.out = grid_size)
  grid_points <- expand.grid(X1 = x1_seq, X2 = x2_seq)
  forest_surface <- mondrian_pred(tree_fit,as.matrix(grid_points))
  forest_matrix <- matrix(forest_surface, nrow = grid_size, ncol = grid_size)
  contour(x1_seq, x2_seq, forest_matrix, 
          main = "Fitted Function Surface by Mondrian Forest", 
          xlab = "X1", ylab = "X2", col = terrain.colors(10))

Question 5: [15 pts] Mondrian Forest

Now you can utilize the previous code to fit a forest of Mondrian trees. Write a function that fits \(B = 100\) Mondrian trees, and make predictions by averaging the predictions from all trees. Note that each tree should be trained on a bootstrap sample of the training data. You may consider saving your fitted Mondrian trees in a list(), and call elements as forest[[b]]. Use the same training and testing data as in Question 4, and report the MSE on the testing data. Experiment with different values of \(\lambda\) and report your findings (are they better than a single tree model?).

This is my fitted surface by the Mondrian forest with \(\lambda = 10\):

Answer:

  set.seed(546)
  
  # Function to fit a Mondrian forest
  MondrianForestFit <- function(B, lambda, train_X, train_Y) {
    
    forest <- vector("list", B)
    
    for (b in 1:B) {
      onetree <- myMondrianTree(lambda = lambda, p = ncol(train_X))
      
      # Bootstrap sample
      use_indices <- sample(1:nrow(train_X), nrow(train_X), replace = TRUE)
      forest[[b]] <- mondrian_fit(onetree$tree, train_X[use_indices, ], train_Y[use_indices])
    }

    return(forest)
  }
  
  # Function to predict using Mondrian forest
  MondrianForestPred <- function(forest, test_X) {
    
    B <- length(forest)
    
    # go over all trees and get predictions
    all_predictions <- sapply(1:B, function(b) mondrian_pred(forest[[b]], test_X))
    
    # average predictions
    forest_predictions <- rowMeans(all_predictions, na.rm = TRUE)
    
    return(forest_predictions)
  }
  
  # Fit Mondrian forest
  B <- 100
  mondrian_forest <- MondrianForestFit(B, lambda, train_X, train_Y)
  
  # Make predictions on testing data
  forest_predictions <- MondrianForestPred(mondrian_forest, test_X)
  
  # Calculate Mean Squared Error
  mse_forest <- mean((forest_predictions - test_Y)^2, na.rm = TRUE)
  
  print(paste("Mean Squared Error of Mondrian Forest on Test Data:", round(mse_forest, 4)))

## [1] "Mean Squared Error of Mondrian Forest on Test Data: 1.0695"

  # plot the fitted surface of the testing data
  grid_size = 30
  x1_seq <- seq(0, 1, length.out = grid_size)
  x2_seq <- seq(0, 1, length.out = grid_size)
  grid_points <- expand.grid(X1 = x1_seq, X2 = x2_seq)
  forest_surface <- MondrianForestPred(mondrian_forest, as.matrix(grid_points))
  forest_matrix <- matrix(forest_surface, nrow = grid_size, ncol = grid_size)
  contour(x1_seq, x2_seq, forest_matrix, 
          main = "Fitted Function Surface by Mondrian Forest", 
          xlab = "X1", ylab = "X2", col = terrain.colors(10))

The prediction error is 1.0695, which almost the same as the variance of the random noise. It is significantly better than a single tree model.

Question 6: [50 pts] Adaboost Classification with Stump Model

In our previous question, the Mondrian tree is a non-adaptive model, since the partition does not depend on the data. In this question, we will implement the Adaboost algorithm with decision stumps as the base learner, and it is actually data adaptive. A stump model partitions the input space into two regions by a cutoff point \(c\), and assigns a prediction \(\{-1, +1\}\) to each side. We work with simulated one-dimensional data:

  set.seed(5)
  n <- 200
  x <- runif(n)
  py <- function(x) sin(4 * pi * x) / 3 + 0.5
  y <- (rbinom(n, 1, py(x)) - 0.5) * 2
  
  plot(x, y + 0.1 * runif(n, -1, 1), ylim = c(-1.1, 1.1), pch = 19,
       col = ifelse(y == 1, "darkorange", "deepskyblue"), ylab = "y")
  lines(sort(x), py(x)[order(x)] - 0.5)

  testx <- seq(0, 1, length.out = 1000)
  testy <- (rbinom(1000, 1, py(testx)) - 0.5) * 2

Recall that a stump model works this way:

Input: A set of data \({\cal D}_n = \{x_i, y_i, w_i\}_{i=1}^n\)
Output: The cutting point \(c\), and node predictions \(f_L, f_R \in \{-1, 1\}\)
Step 1: Search for a splitting rule \(\mathbf{1}(x \leq c)\) that maximizes the weighted reduction of Gini impurity. \[ \texttt{score} = - \, \frac{\sum_{ {\cal T}_L} w_i}{\sum w_i} \text{Gini}({\cal T}_L) - \frac{\sum_{ {\cal T}_R} w_i}{\sum w_i} \text{Gini}( {\cal T}_R ),\] where, for given data in a potential node \({\cal T}\), the weighted version of Gini impurity is \[ \text{Gini}({\cal T}) = \widehat p (1- \widehat p), \qquad \widehat p = (\textstyle \sum w_i)^{-1} \textstyle\sum w_i I(y_i = 1).\]
Step 2: Calculate and record the left and the right node prediction values \(f_L, f_R \in \{-1, 1\}\) respectively. Note you also need to incorporate the weights in these calculations.

You should write a function called myStump(x, y, w) that outputs the cutoff point, and the left and right predictions. Note that you need to exhaust all the cutoff point, however, given a set of observed values, only these observed unique \(x\) values matter. Once your finish the stump model algorithm, test your code in the following two scenarios:

All training data has equal weights.
Observations with \(x \geq 0.5\) has weights 2, while observations with \(x < 0.5\) has weights 0.1.

Answer:

  # Weighted Gini impurity
  gini <- function(y, w) {
    p <- weighted.mean(y == 1, w)
    p * (1 - p)
  }
  
  # Fit a weighted stump
  myStump <- function(x, y, w) {
    n <- length(x)
    scores <- numeric(n - 1)
    
    for (i in 1:(n - 1)) {
      idx <- (x <= x[i])
      scores[i] <- -(
        sum(w[idx])   * gini(y[idx], w[idx]) +
        sum(w[!idx])  * gini(y[!idx], w[!idx])
      ) / sum(w)
    }
    
    cutoff <- x[which.max(scores)]
    left_p  <- weighted.mean(y[x <= cutoff] == 1, w[x <= cutoff])
    right_p <- weighted.mean(y[x >  cutoff] == 1, w[x >  cutoff])
    
    list(
      cutoff = cutoff,
      left   = ifelse(left_p  > 0.5, 1, -1),
      right  = ifelse(right_p > 0.5, 1, -1)
    )
  }
  
  # Test stump under two weighting schemes
  w_equal <- rep(1 / n, n)
  myStump(x, y, w_equal)

## $cutoff
## [1] 0.2486734
## 
## $left
## [1] 1
## 
## $right
## [1] -1

  w_biased <- ifelse(x >= 0.5, 2, 0.1)
  myStump(x, y, w_biased)

## $cutoff
## [1] 0.8036775
## 
## $left
## [1] 1
## 
## $right
## [1] -1

Let’s then write our own code for AdaBoost which is an iterative method that calls the stump model and update the weights. Implement the formula we introduced in the lecture, and perform the following

You are required to implement a shrinkage factor \(\delta\), which is commonly used in boosting algorithms. Note that the shrinkage factor essentially works on the \(\alpha\) to achieve a smaller step size.
You are not required to do bootstrapping for each tree (you still can if you want).
Calculate the classification error and the exponential loss bound \(n^{-1} \sum_{i=1}\exp\{- y_i \delta \sum_k \alpha_k f_k(x_i)\}\) over the number of tree iterations for your training data. You can either record this in your fitting function or do this using a prediction function (the next step).
Write a prediction function for your testing data that outputs the predicted value of all testing samples over all tree iterations.
Plot the three quantities above over the tree iterations: training error, training exponential loss bound, and the testing error.
Try two different shrinkage factors of this plot, and comment on your findings. Please note that you may need to adjust the number of trees so that your algorithm works well.
For each of the shrinkage factor you used, plot the final model (functional value of \(F\), and also the sign) with the observed data.

Answer:

  # fit Adaboost
  my_adaboost <- function(x, y, ntrees, epsilon = 0.5){
    modellist = list()
    w = rep(1/length(x), length(x))
    for (k in 1:ntrees){
        fk = myStump(x, y, w)
        
        fit = ifelse(x <= fk$cutoff, fk$left, fk$right)
        e = sum(w*(fit != y))
        
        # cat(paste(" e is ", e, "\n") )
        a = 0.5 * log((1-e)/e)
        w = w * exp(-epsilon*a*y*fit)
        w = w / sum(w)
        
        # apply shrinkage parameter to alpha
        fk['alpha'] = a * epsilon
        
        modellist[[k]] = fk
    }
    return(modellist)
  }
  
  
  # predict by AdaBoost
  my_adaboost_pred <- function(fit, K, testx, testy){
    pred = rep(0, length(testx))
    exp_loss = err = rep(NA, K)
    
    for (k in 1: min(K, length(fit))){
      
        pred_tree = ifelse(testx <= fit[[k]]$cutoff, fit[[k]]$left, fit[[k]]$right)
        pred = pred + fit[[k]]$alpha * pred_tree
        
        exp_loss[k] = mean( exp(- testy * pred) )
        err[k] = mean(testy != sign(pred))
    }
    
    return(list("pred" = pred, "exp_loss" = exp_loss, "err" = err))
  }

To fit the adaboost:

  # fit model and prediction
  K = c(50, 100, 200)
  delta = c(1, 0.5, 0.1)
  pred_test <- list()
  pred_train <- list()
  
  for(i in 1:3){
    myfit = my_adaboost(x, y, K[i], delta[i])
  
    pred_test[[i]] = my_adaboost_pred(myfit, K[i], testx, testy)
    pred_train[[i]] = my_adaboost_pred(myfit, K[i], x, y)
  }
  
  par(mfrow = c(2, 3))
  # plot the exponential loss and error
  for(i in 1:3){
    main_text = paste0("Exponential loss and errors, delta = ", delta[i])
    plot(pred_train[[i]]$exp_loss, type = "l", ylim = c(0, 1),
         main = main_text, ylab = "exponetial loss", xlab = "ntrees")
    lines(pred_train[[i]]$err, col = 'green')
    lines(pred_test[[i]]$err, col = 'red')
    legend("right", lty = 1, legend = c("exp loss", "train error", "test error"), col = c("black", "green", "red"))
  }
  
  # plot fitted model (last iteration)
  for(i in 1:3){
    main_text = paste0("Final model, delta = ", delta[i])
    plot(x, y + 0.1*runif(n, -1, 1), ylim = c(-1.1, 1.1), 
         col = ifelse(y == 1, "darkorange", "deepskyblue"), ylab = "y",
         main = main_text)
    lines(sort(x), py(x)[order(x)] - 0.5)
    lines(testx, pred_test[[i]]$pred, col = "red")
    lines(testx, ifelse(pred_test[[i]]$pred > 0, 1, -1), col = "blue", lty = 2)
  }

Comments on Number of Trees (not required)

On the training data, the exponential loss consistently decreases as the number of trees increases.
On the testing data, the error curve may eventually take on a U-shape when \(\delta = 1\): performance improves initially, but can worsen again as more stumps are added. This indicates overfitting. The fitted function may also start showing irregular shapes in later iterations.
To mitigate overfitting, one can
- tune the optimal number of trees \(K\) by monitoring test error, or
- apply an early stopping rule, halting boosting once the test error stabilizes.

Comments on Shrinkage Parameter

A smaller \(\delta\) slows convergence but typically yields a more stable model:
- the exponential loss curve is smoother, with fewer oscillations,
- the fitted function is less erratic.
A larger \(\delta\) speeds up learning but, with many trees, can amplify overfitting.
In practice, \(\delta\) is tuned as part of a bias–variance trade-off: smaller values improve stability and generalization, while larger values shorten training time but may sacrifice robustness.

Stat 546 Homework 3

Assigned: Sep 26, 2025; Due: 11:59 PM CT, Oct 15, 2025

Instruction

Question 1: [10 pts] Easiest Question of this Homework

Question 2: [25 pts] Generating Mondrian Trees

Question 3: [15 pts] Validate Your Mondrian Tree

Question 4: [15 pts] Mondrian Tree Fitting and Prediction

Question 5: [15 pts] Mondrian Forest

Question 6: [50 pts] Adaboost Classification with Stump Model