Instruction

Please remove this section when submitting your homework.

Students are encouraged to work together on homework and/or utilize advanced AI tools. However, sharing, copying, or providing any part of a homework solution or code to others is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.

Question 1 (Multivariate Normal Distribution)

This question is about playing with AI tools for generating multivariate normal random variables. Let \(X_i\), \(i = 1, \ldots, n\) be i.i.d. multivariate normal random variables with mean \(\mu\) and covariance matrix \(\Sigma\), where \[ \mu = \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \quad \text{and} \quad \Sigma = \begin{bmatrix} 1 & 0.5 \\ 0.5 & 1 \end{bmatrix}. \] Write R code to perform the following tasks. Please try to use AI tools as much as possible in this question.

  1. [10 points] Generate a set of \(n = 2000\) observations from this distribution. Only display the first 5 observations in your R output. Make sure set random seed \(=1\) in order to replicate the result. Calculate the sample covariance matrix of the generated data and compare it with the true covariance matrix \(\Sigma\).

  2. [10 points] If you used VS Code and AI tools to perform the previous question, then they will most likely suggest using the mvrnorm function from the MASS package. However, there are alternative ways to complete this question. For example, you could first generate \(n\) standard normal random variables, and then transform them to the desired distribution. Write down the mathematical formula of this approach in Latex, and then write R code to implement this approach. Only display the first 5 observations in your R output. Validate your approach by computing the sample covariance matrix of the generated data and compare it with the true covariance matrix \(\Sigma\). Please note that you should not use the mvrnorm function anymore in this question.

  3. [10 points] Write an R function called mymvnorm that takes the following arguments: n, mu, sigma. The function should return a matrix of dimension \(n \times p\), where \(p\) is the length of mu. The function should generate \(n\) observations from a multivariate normal distribution with mean mu and covariance matrix sigma. You should not use the mvrnorm function in your code. Instead, use the logic you wrote in part b) to generate the data. Again, validate your result by calculating the sample covariance matrix of the generated data and compare to \(\Sigma\). Also, when setting seed correctly, your answer in this question should be identical to the one in part b).

  4. [10 points] If you used any AI tools during the first three questions, write your experience here. Specifically, what tool(s) did you use? What prompt was used? Did the tool suggested a corrected answer to your question? If not, which part was wrong? How did you corrected their mistakes (e.g modifying your prompt)?

Question 2 (Data Manipulation Plots)

The following question practices data manipulation and summary statistics. Our goal is to write a function that calculates the price gap between any two given dates. Load the quantmod package and obtain the AAPL data (apple stock price).

  library(quantmod)
  getSymbols("AAPL")
## [1] "AAPL"
  plot(AAPL$AAPL.Close, pch = 19)

  1. [20 points] Calculate a 90-day moving average of the closing price of AAPL and plot it on the same graph. Moving average means that for each day, you take the average of the previous 90 days. Please do this in two ways: 1) there is a built-in function called SMA in the quantmod package; 2) write your own function to calculate the moving average. For both questions, you can utilize AI tools to help you write the code.

  2. [15 points] I have an alternative way of writing this function.

  my_average <- function(x, window) {
    # Compute the moving average of x with window size = window
    n <- length(x)
    ma <- rep(NA, n)
    for (i in window:n) {
      myinterval = (i-window/2):(i + window/2)
      myinterval = myinterval[myinterval > 0 & myinterval <= n]
      ma[i] <- mean( x[ myinterval ] )
    }
    return(ma)
  }

  AAPL$MA90 <- my_average(Cl(AAPL), 90)
  plot(AAPL$AAPL.Close, pch = 19)

  lines(AAPL$MA90, col = "red", lwd = 3)

Can you comment on the difference of these two functions? Do you think my line is approximating the true price better? Which one do you prefer and why.

Question 3 (Read/write Data)

  1. [10 Points] The ElemStatLearn package [CRAN link] is an archived package. Hence, you cannot directly install it using the install.packages() function. Instead, you may install an older version of it by using the install_github() function from the devtools package. Install the devtools package and run the find the code to install the ElemStatLearn package.

  2. [15 Points] Load the ElemStatLearn package and obtain the ozone data. Save this data into a .csv file, and then read the data back from that file into R. Print out the first 5 observations to make sure that the new data is the same as the original one.