Please remove this section when submitting your homework.
Students are encouraged to work together on homework and/or utilize advanced AI tools. However, there are two basic rules:
Final submissions must be uploaded to Gradescope. No email or hard copy will be accepted. Please refer to the course website for late submission policy and grading rubrics.
HWx_yourNetID.pdf
. For example,
HW01_rqzhu.pdf
. Please note that this must be a
.pdf
file. .html
format will not be
accepted because they are often not readable on gradescope.
Make all of your R
code chunks visible for
grading.R
is \(\geq 4.0.0\). This
will ensure your random seed generation is the same as everyone
else..Rmd
file
as a template, be sure to remove this instruction
section.This question is about playing with AI tools for generating multivariate normal random variables. Let \(X_i\), \(i = 1, \ldots, n\) be i.i.d. multivariate normal random variables with mean \(\mu\) and covariance matrix \(\Sigma\), where
\[
\mu = \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \quad \text{and} \quad
\Sigma = \begin{bmatrix} 1 & 0.5 \\ 0.5 & 1 \end{bmatrix}.
\] Write R
code to perform the following tasks.
Please try to use AI tools as much as possible in this question.
[10 points] Generate a set of \(n =
2000\) observations from this distribution. Only display the
first 5 observations in your R
output. Make sure set random
seed \(=1\) in order to replicate the
result. Calculate the sample covariance matrix of the generated data and
compare it with the true covariance matrix \(\Sigma\).
[10 points] If you used any AI tools to perform the previous
question, they will most likely suggest using the mvrnorm
function from the MASS
package. However, there are
alternative ways to complete this question. For example, you could first
generate \(n\) standard normal random
variables, and then transform them to the desired distribution. Write
down the mathematical formula of this approach in Latex, and then write
the corresponding R
code to implement this approach. Only
display the first 5 observations in your R
output. Validate
your approach by computing the sample covariance matrix of the generated
data and compare it with the true covariance matrix \(\Sigma\). Please note that you
should not use the mvrnorm
function
anymore in this question.
[10 points] Write an R
function called
mymvnorm
that takes the following arguments:
n
, mu
, sigma
. The function should
return a matrix of dimension \(n \times
p\), where \(p\) is the length
of mu
. The function should generate \(n\) observations from a multivariate normal
distribution with mean mu
and covariance matrix
sigma
. You should not use the mvrnorm
function
or any other similar built-in R functions in your code. Instead, use the
logic you wrote in part b) to generate the data. Again, validate your
result by calculating the sample covariance matrix of the generated data
and compare to \(\Sigma\). Also, when
setting seed correctly, your answer in this question should be identical
to the one in part b).
[5 points] Briefly comment on your usage of AI tools in the above questions.
[5 points] Try to create a question related to multivariate normal distribution that you think the AI is going to have difficulty answering. Write down the question and the answer you expect and got from AI. Were you able to trick the AI? If not, briefly discuss your experience.
The following question practices data manipulation, visualization and
linear regression. Load the quantmod
package and obtain the
AAPL
data (apple stock price).
library(quantmod)
getSymbols("AAPL")
## [1] "AAPL"
plot(AAPL$AAPL.Close, pch = 19)
[15 points] Calculate a 10-day moving average of the closing
price of AAPL
and plot it on the same graph. Moving average
means that for each day, you take the average of the past 10 days
(including the current day). Please do this in two ways: 1) there is a
built-in function called SMA
in the quantmod
package; 2) write your own function to calculate the moving average.
Plot and also check if the two calculations are identical. For both
questions, you can utilize AI tools to help you write the code.
[15 points] Let’s do a simple linear regression that predicts the
average closing price of AAPL
of the next five days (not
including the current day) using two variables: the average of the past
10 days, and the average of the past 20 days. Provide summaries of the
regression results and comment on whether the information beyond the
past 10 days is useful.
[10 points] This model fitting is too simple. What are the potential issues of this model that could make the results unreliable? Briefly discuss two your findings and also search the literature and provide a reference to support your findings.
[10 points] The ElemStatLearn
package [CRAN
link] is an archived package. Hence, you cannot directly install it
using the install.packages()
function. Instead, you may
install an older version of it by using the
install_github()
function from the devtools
package. Install the devtools
package and run the find the
code to install the ElemStatLearn
package.
[10 Points] Load the ElemStatLearn
package and
obtain the ozone
data. Save this data into a
.csv
file, and then read the data back from that file into
R
. Print out the first 5 observations to make sure that the
new data is the same as the original one.