Please remove this section when submitting your homework.
Students are encouraged to work together on homework and/or utilize advanced AI tools. However, sharing, copying, or providing any part of a homework solution or code to others is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible. Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. For late submission policy and grading rubrics, please refer to the course website.
HWx_yourNetID.pdf
. For example,
HW01_rqzhu.pdf
. Please note that this must be a
.pdf
file. .html
format
cannot be accepted. Make all of your R
code chunks visible for grading..Rmd
file
as a template, be sure to remove this instruction
section.R
is \(\geq
4.0.0\). This will ensure your random seed generation is the same
as everyone else. Please note that updating the R
version
may require you to reinstall all of your packages.This question is about playing with AI tools for generating
multivariate normal random variables. Let \(X_i\), \(i = 1,
\ldots, n\) be i.i.d. multivariate normal random variables with
mean \(\mu\) and covariance matrix
\(\Sigma\), where \[
\mu = \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \quad \text{and} \quad
\Sigma = \begin{bmatrix} 1 & 0.5 \\ 0.5 & 1 \end{bmatrix}.
\] Write R
code to perform the following tasks.
Please try to use AI tools as much as possible in this question.
[10 points] Generate a set of \(n =
2000\) observations from this distribution. Only display the
first 5 observations in your R
output. Make sure set random
seed \(=1\) in order to replicate the
result. Calculate the sample covariance matrix of the generated data and
compare it with the true covariance matrix \(\Sigma\).
[10 points] If you used VS Code and AI tools to perform the
previous question, then they will most likely suggest using the
mvrnorm
function from the MASS
package.
However, there are alternative ways to complete this question. For
example, you could first generate \(n\)
standard normal random variables, and then transform them to the desired
distribution. Write down the mathematical formula of this approach in
Latex, and then write R
code to implement this approach.
Only display the first 5 observations in your R
output.
Validate your approach by computing the sample covariance matrix of the
generated data and compare it with the true covariance matrix \(\Sigma\). Please note that you
should not use the mvrnorm
function
anymore in this question.
[10 points] Write an R
function called
mymvnorm
that takes the following arguments:
n
, mu
, sigma
. The function should
return a matrix of dimension \(n \times
p\), where \(p\) is the length
of mu
. The function should generate \(n\) observations from a multivariate normal
distribution with mean mu
and covariance matrix
sigma
. You should not use the mvrnorm
function
in your code. Instead, use the logic you wrote in part b) to generate
the data. Again, validate your result by calculating the sample
covariance matrix of the generated data and compare to \(\Sigma\). Also, when setting seed
correctly, your answer in this question should be identical to the one
in part b).
[10 points] If you used any AI tools during the first three questions, write your experience here. Specifically, what tool(s) did you use? What prompt was used? Did the tool suggested a corrected answer to your question? If not, which part was wrong? How did you corrected their mistakes (e.g modifying your prompt)?
The following question practices data manipulation and summary
statistics. Our goal is to write a function that calculates the price
gap between any two given dates. Load the quantmod
package
and obtain the AAPL
data (apple stock price).
library(quantmod)
getSymbols("AAPL")
## [1] "AAPL"
plot(AAPL$AAPL.Close, pch = 19)
[20 points] Calculate a 90-day moving average of the closing
price of AAPL
and plot it on the same graph. Moving average
means that for each day, you take the average of the previous 90 days.
Please do this in two ways: 1) there is a built-in function called
SMA
in the quantmod
package; 2) write your own
function to calculate the moving average. For both questions, you can
utilize AI tools to help you write the code.
[15 points] I have an alternative way of writing this function.
my_average <- function(x, window) {
# Compute the moving average of x with window size = window
n <- length(x)
ma <- rep(NA, n)
for (i in window:n) {
myinterval = (i-window/2):(i + window/2)
myinterval = myinterval[myinterval > 0 & myinterval <= n]
ma[i] <- mean( x[ myinterval ] )
}
return(ma)
}
AAPL$MA90 <- my_average(Cl(AAPL), 90)
plot(AAPL$AAPL.Close, pch = 19)
lines(AAPL$MA90, col = "red", lwd = 3)
Can you comment on the difference of these two functions? Do you think my line is a good choice when it is used for predicting future prices? Which one do you prefer and why.
[10 Points] The ElemStatLearn
package [CRAN
link] is an archived package. Hence, you cannot directly install it
using the install.packages()
function. Instead, you may
install an older version of it by using the
install_github()
function from the devtools
package. Install the devtools
package and run the find the
code to install the ElemStatLearn
package.
[15 Points] Load the ElemStatLearn
package and
obtain the ozone
data. Save this data into a
.csv
file, and then read the data back from that file into
R
. Print out the first 5 observations to make sure that the
new data is the same as the original one.