Vehicle characteristics | MyTopTutor

BU510.650 – Data Analytics, Fall 2021
Assignment # 4
Please submit two documents: Your answers to each part of every question in .pdf or .doc format, and
your R script, in .R format. In your document with answers, please do *not* respond with R output only.
While it is okay to include R output in that document, please make sure you spell out the response to
the question asked. Please submit your assignment through Blackboard and name your files using the
convention LastName_FirstName_AssignmentNumber. For example, Yazdi_Mohammad_4.pdf and
Yazdi_Mohammad_4.R.
Question 1)
In this question, you will perform model selection using AutoLoss data set. Our goal is to predict the loss
payment for a vehicle (the payments made by an insurance company to cover claims) as a function of
vehicle characteristics.
To start your work on this question, read the data in AutoLoss.csv to a data frame called AutoLoss.
This data set has missing values, marked as “?” in the data file, so we will read the data to make sure
that we identify the rows with ?s, so that we can remove them. To do so, first, replace ?s with NA while
reading the data from the .csv file, and then remove all the observations with any NA.
a) Using the best subset selection method and allowing up to 15 predictors, use regsubsets() to
determine the best model with k predictors for k = 1, 2, …, 15. Use the output to answer the
following question: Which predictors are included in the best model with 11 predictors?
Note: I will add extra information to help you understand one subtlety about the output. Please pay
attention to the names of predictors displayed in the output – for example, the output will not show
BodyStyle, which is one of the columns in your data, as a predictor. Instead, you will notice predictor
names BodyStylehardtop, BodyStylehatchback, BodyStylesedan, BodyStylewagon. This is because
BodyStyle is a qualitative variable, which can take the values “hardtop,” “hatchback,” “sedan,”
“wagon,” and “convertible.” R is replacing the qualitative variable “BodyStyle” with four columns
(BodyStylehardtop, BodyStylehatchback, BodyStylesedan, BodyStylewagon), which are 0-1 columns.
For example, BodyStylehardtop would be 1 if the car’s BodyStyle is hardtop and 0 otherwise.
BodyStylehatchback would be 1 if the car’s BodysStyle is hatchback, and 0 otherwise. This is similar
to what we did in Toyota Used Car example in class (replacing the Fuel Type column with CNGFuel
and DieselFuel columns, which were 0-1 columns), but R is automating it for you here when you run
regsubsets().
b) Focusing on the best model with 8 predictors, answer the following True / False questions:
i Whether or not a car has four doors is a predictor in this model.
ii Whether or not a car’s body style is hard top is a predictor in this model.iii Whether or not a car’s drive wheels is forward is a predictor in this model.
c) Repeat part (a), but this time using the forward stepwise selection method
d) Using the forward stepwise selection method and allowing up to 15 predictors, what is the best
model according to Cp criterion? State the predictors in the best model and their coefficients.
Comment on predictors: What types of cars tend to have higher losses? What types of cars tend to
have lower losses?
Hint: In part (c), you used regsubsets() with forward selection to determine the best model with k
predictors for k = 1, 2, …, 15. Now, you need to compare the Cp of the best model with 1 predictor
versus the Cp of the best model with 2 predictors versus …. the Cp of the best model with 15
predictors, to determine the number of predictors that minimizes Cp. For guidance, check how we
accomplished the same goal in Task 6 of Hitters – Subset Selection example.
Question 2)
Suppose we have a data set, which has p predictors (input variables), and we perform model selection
using (i) best subset selection (BSS), (ii) forward stepwise selection (FwSS), and (iii) backward stepwise
selection (BwSS). Specifically, using each of these three approaches, we determine the best model with
k predictors for all possible values of k, that is, k = 1, 2, …, p. Vehicle characteristics
Answer the following True or False questions:
I. The predictors in the k-predictor model identified by FwSS are a subset of predictors in the
(k+1)-predictor model identified by FwSS.
II. The predictors in the k-predictor model identified by BwSS are a subset of predictors in the
(k+1)-predictor model identified by BwSS.
III. The predictors in the k-predictor model identified by BSS are a subset of predictors in the (k+1)-
predictor model identified by BSS.
IV. The predictors in the k-predictor model identified by BwSS are a subset of predictors in the
(k+1)-predictor model identified by FwSS. Vehicle characteristics
V. The predictors in the k-predictor model identified by FwSS are a subset of predictors in the
(k+1)-predictor model identified by BwSS.

order now

Are you overwhelmed by your class schedule and need help completing this assignment? You deserve the best professional and plagiarism-free writing services. Allow us to take the weight off your shoulders by clicking this button.

Get help