## Computations and R commands | MyTopTutor

Computations and R commands

Problem set 2
problemset2.d
For all questions please explain, step by step, all your computations and R commands
1 (25 points) Linear Regression Explanation
First, generate 1,000 data points from a normal distribution with mean 0 and standard
deviation 1 by typing var1 <- rnorm(1000, 0, 1). Generate a second variable in the same way
(call it var2). Run a linear regression of var2~var1. Then run summary() to see the
results
Interpret and explain the regression results.
set.seed(100)
var1 <- rnorm(1000, 0, 1)
var2 <- rnorm(1000, 0, 1)
2 (25 points) Data Transformation
The file “rollingsales_manhattan.csv” is New York City (Manhattan) housing sales data.
Analyze sales using regression with two predictors “ZIP.CODE” and “YEAR.BUILT”. Hint:
SALE.PRICE is the target variable of interest and should hence be on the left-hand side of the
regressions. Since the sales data is very skewed (you can verify this by plotting the data as a
histogram), you should perform a log() transformation of the sales variable first. Interpret your
findings.
3. (25 points) SVM
We will use the spam dataset which comes with the {kernlab} package. First, we will split the
spam data randomly into two halves: one half we will use as the training data, the other half we
will use as the test data. The target variable is “type” which is a binary class spam and nospam.
You can look at the help page for the dataset to find out what the different columns mean
(hint:?spam).
1. Fit a support vector classifier using svm() on the training data. type is the target and all other
variables can be used as predictors (hint: you can use the . notation which automatically
includes all columns of the data.frame as predictors except the target variable).
2. Predict spam/nonspam classes for the data in the test dataset. How does the predicted
classification compare with the true classes? What is the classification error?
3. Can you improve the classification accuracy? (Hint: Start by exploring different settings for the
cost attribute and using different predictors.)Use the following code fragment to get you started. You may have to
install.packages(“kernlab”)
# install.packages(“kernlab”)
library(e1071)
library(kernlab)
data(spam)
set.seed(02115)
sample <- sample( c(TRUE, FALSE), nrow(spam), replace=TRUE)
train <- spam[sample,]
test <- spam[!sample,]
4. (25 points) Least Squares Method
The calculation of “Sum of Squared Errors” is illustrated as following.
For the columns x and y, if we use an estimate equation ̂ = 1.5698 + .0407, we can calculate the “Sum of
Squared Errors” ∑( − ̂)2 = 0.0517 + 0.0029 + 0.0152 + 0.0433 = 0.1131
The estimate equation ̂ = 1.5698 + .0407 is not the best linear estimate equation. We would like to find
the best estimate linear equation ̂ = a + b , so that the “Sum of Squared Errors” is the minimum among
all choices of a and b. There are many algorithms to find the best choice for a and b. Here we use “brute
force” to find the best choice for a and b.
Write a R code to find the best a and b using “brute force”, where a takes values in seq(3, 5, 0.1), b
takes values in seq(-0.5, 0.5, 0.01). Use the following code to get you started.
df<-data.frame(x=c(61,63,67,69),y=c(4.28,4.08,4.42,4.17))
a_range<-seq(3,5,0.1)
b_range<-seq(-0.5,0.5,0.01)
X y
̂ = 1.5698 + .0407
− ̂ ( − ̂)2
61 4.28 1.5698+0.0407*61=4.0526 4.28-4.0526=0.2274 (0.2274)*(0.2274)=0.0517
63 4.08 1.5698+0.0407*63=4.134 4.08-4.134=-0.054 (-0.054)*(-0.054)=0.0029
67 4.42 1.5698+0.0407*67=4.2968 4.42-4.2968=0.1232 (0.1232)*(0.1232)=0.0152
69 4.17 1.5698+0.0407*69=4.3782 4.17-4.3782=-0.2082 (-0.2082)*(-0.2082)=0.0433
0.0517+0.0029+0.0152+0.0433=0.1131

Are you overwhelmed by your class schedule and need help completing this assignment? You deserve the best professional and plagiarism-free writing services. Allow us to take the weight off your shoulders by clicking this button.

Get help