Instruction

Students are encouraged to work together on homework. However, sharing, copying, or providing any part of a homework solution or code is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.

Question 1: [10 pts] Easiest Question of this Homework

Given a fitted random forest, with each tree as a partition on the space of \(\cal X\), show that the kernel function \(K(x, z)\) induced from the forest is positive definite.

Question 2: [25 pts] Generating Mondrian Trees

For this question, our goal is to write a function that creates a kernel based on the idea of Mondrian tree. This question consists of several parts:

Here is an example of the tree structure for a tree with 3 internal nodes and 4 leaf nodes. The first row is the root node that splits into node 2 and 3, and each is further split into two leaf nodes.

Call the function with \(p = 2\) and \(\lambda = 2\). And print the tree structure (restrict that up to the 5th row) and the bounding boxes.

Here is my output:

     node_type split_dim   cut_loc left_child right_child pred_value
[1,]         1         2 0.1303658          2           3         NA
[2,]         1         1 0.2542666          4           5         NA
[3,]         1         1 0.1783499          6           7         NA
[4,]        -1        NA        NA         NA          NA         NA
[5,]        -1        NA        NA         NA          NA         NA
##           [,1]      [,2]
## [1,] 0.0000000 0.0000000
## [2,] 0.0000000 0.0000000
## [3,] 0.0000000 0.1303658
## [4,] 0.0000000 0.0000000
## [5,] 0.2542666 0.0000000
##           [,1]      [,2]
## [1,] 1.0000000 1.0000000
## [2,] 1.0000000 0.1303658
## [3,] 1.0000000 1.0000000
## [4,] 0.2542666 0.1303658
## [5,] 1.0000000 0.1303658

Question 3: [15 pts] Validate Your Mondrian Tree

After constructing the Mondrian tree, you should be able to visualize the partition on \([0, 1]^2\) for \(p = 2\). To do this, use the rect() function to plot the terminal node bounding boxes of all leaf nodes. You can consider assigning a color to each leaf node. Here is what I have.

In addition, generate 300 observations uniformly on \([0, 1]^2\). And write a prediction function, that finds the terminal node for each observation. You can generate 300 observations uniformly on \([0, 1]^2\). Then, for each observation, find the leaf node it belongs to. Assign a color to each leaf node, and plot the observations with the color of the leaf node they belong to. Here is what I have. After this question, you can discard the bounding box matrices, since they are not needed anymore.

Question 4: [15 pts] Mondrian Tree Fitting and Prediction

Now you have your tree model, lets use it for a regression problem.

Use the following data generating process to generate your training and testing data.

\[ Y = \sin(2 \pi X_1) + 2 x_2^2 + \epsilon, \quad \epsilon \sim N(0, 1) \]

Where \(X\) is uniform on \([0, 1]^p\). The functional surface look like the following:

  set.seed(546)
  
  # Generate a grid of points for plotting the true function
  grid_size <- 50
  x1_seq <- seq(0, 1, length.out = grid_size)
  x2_seq <- seq(0, 1, length.out = grid_size)
  grid_points <- expand.grid(X1 = x1_seq, X2 = x2_seq)
  
  # True function
  true_function <- function(x) {
    sin(2 * pi * x[1]) + 2 * x[2]^2
  }
  
  # Compute the true function values on the grid
  true_values <- apply(grid_points, 1, true_function)
  
  # Reshape for contour plot
  true_matrix <- matrix(true_values, nrow = grid_size, ncol = grid_size)
  
  # Plot the true function surface
  contour(x1_seq, x2_seq, true_matrix, 
          main = "True Function Surface", 
          xlab = "X1", ylab = "X2", col = terrain.colors(10))

Generate 400 training data and 1000 testing data. Fit a Mondrian tree with \(\lambda = 4\) on the training data, and make predictions on the testing data. You may try to experiment with different values of \(\lambda\). Report the Mean Squared Error (MSE) on the testing data.

This is my fitted surface by the Mondrian tree with \(\lambda = 10\):

Question 5: [15 pts] Mondrian Forest

Now you can utilize the previous code to fit a forest of Mondrian trees. Write a function that fits \(B = 100\) Mondrian trees, and make predictions by averaging the predictions from all trees. Note that each tree should be trained on a bootstrap sample of the training data. You may consider saving your fitted Mondrian trees in a list(), and call elements as forest[[b]]. Use the same training and testing data as in Question 4, and report the MSE on the testing data. Experiment with different values of \(\lambda\) and report your findings (are they better than a single tree model?).

This is my fitted surface by the Mondrian forest with \(\lambda = 10\):

Question 6: [50 pts] Adaboost Classification with Stump Model

In our previous question, the Mondrian tree is a non-adaptive model, since the partition does not depend on the data. In this question, we will implement the Adaboost algorithm with decision stumps as the base learner, and it is actually data adaptive. A stump model partitions the input space into two regions by a cutoff point \(c\), and assigns a prediction \(\{-1, +1\}\) to each side. We work with simulated one-dimensional data:

  set.seed(5)
  n <- 200
  x <- runif(n)
  py <- function(x) sin(4 * pi * x) / 3 + 0.5
  y <- (rbinom(n, 1, py(x)) - 0.5) * 2
  
  plot(x, y + 0.1 * runif(n, -1, 1), ylim = c(-1.1, 1.1), pch = 19,
       col = ifelse(y == 1, "darkorange", "deepskyblue"), ylab = "y")
  lines(sort(x), py(x)[order(x)] - 0.5)

  testx <- seq(0, 1, length.out = 1000)
  testy <- (rbinom(1000, 1, py(testx)) - 0.5) * 2
  1. Recall that a stump model (one-split tree) works this way:

    • Input: A set of data \({\cal D}_n = \{x_i, y_i, w_i\}_{i=1}^n\)
    • Output: The cutting point \(c\), and node predictions \(f_L, f_R \in \{-1, 1\}\)
    • Step 1: Search for a splitting rule \(\mathbf{1}(x \leq c)\) that maximizes the weighted reduction of Gini impurity. \[ \texttt{score} = - \, \frac{\sum_{ {\cal T}_L} w_i}{\sum w_i} \text{Gini}({\cal T}_L) - \frac{\sum_{ {\cal T}_R} w_i}{\sum w_i} \text{Gini}( {\cal T}_R ),\] where, for given data in a potential node \({\cal T}\), the weighted version of Gini impurity is \[ \text{Gini}({\cal T}) = \widehat p (1- \widehat p), \qquad \widehat p = (\textstyle \sum w_i)^{-1} \textstyle\sum w_i I(y_i = 1).\]
    • Step 2: Calculate and record the left and the right node prediction values \(f_L, f_R \in \{-1, 1\}\) respectively. Note that you also need to incorporate the weights in these calculations.

    You should write a function called myStump(x, y, w) that outputs the cutoff point, and the left and right predictions. Note that you need to exhaust all the cutoff point, however, given a set of observed values, only these observed unique \(x\) values matter. Once your finish the stump model algorithm, test your code in the following two scenarios and report your fitted trees:

    • All training data has equal weights.
    • Observations with \(x \geq 0.5\) has weights 2, while observations with \(x < 0.5\) has weights 0.1.
  2. Let’s then write our own code for AdaBoost which is an iterative method that calls the stump model and update the weights. Implement the formula we introduced in the lecture, and perform the following

    • You are required to implement a shrinkage factor \(\delta\), which is commonly used in boosting algorithms. Note that the shrinkage factor essentially works on the \(\alpha\) to achieve a smaller step size.
    • You are not required to do bootstrapping for each tree (you still can if you want).
    • Calculate the classification error and the exponential loss bound \(n^{-1} \sum_{i=1}\exp\{- y_i \delta \sum_k \alpha_k f_k(x_i)\}\) over the number of tree iterations for your training data. You can either record this in your fitting function or do this using a prediction function (the next step).
    • Write a prediction function for your testing data that outputs the predicted value of all testing samples over all tree iterations.
    • Plot the three quantities above over the tree iterations: training error, training exponential loss bound, and the testing error.
    • Try two different shrinkage factors of this plot, and comment on your findings. Please note that you may need to adjust the number of trees so that your algorithm works well.
    • For each of the shrinkage factor you used, plot the final model (functional value of \(F\), and also the sign) with the observed data.