## Vehicle characteristics | MyTopTutor

BU510.650 – Data Analytics, Fall 2021

Assignment # 4

Please submit two documents: Your answers to each part of every question in .pdf or .doc format, and

your R script, in .R format. In your document with answers, please do *not* respond with R output only.

While it is okay to include R output in that document, please make sure you spell out the response to

the question asked. Please submit your assignment through Blackboard and name your files using the

convention LastName_FirstName_AssignmentNumber. For example, Yazdi_Mohammad_4.pdf and

Yazdi_Mohammad_4.R.

Question 1)

In this question, you will perform model selection using AutoLoss data set. Our goal is to predict the loss

payment for a vehicle (the payments made by an insurance company to cover claims) as a function of

vehicle characteristics.

To start your work on this question, read the data in AutoLoss.csv to a data frame called AutoLoss.

This data set has missing values, marked as “?” in the data file, so we will read the data to make sure

that we identify the rows with ?s, so that we can remove them. To do so, first, replace ?s with NA while

reading the data from the .csv file, and then remove all the observations with any NA.

a) Using the best subset selection method and allowing up to 15 predictors, use regsubsets() to

determine the best model with k predictors for k = 1, 2, …, 15. Use the output to answer the

following question: Which predictors are included in the best model with 11 predictors?

Note: I will add extra information to help you understand one subtlety about the output. Please pay

attention to the names of predictors displayed in the output – for example, the output will not show

BodyStyle, which is one of the columns in your data, as a predictor. Instead, you will notice predictor

names BodyStylehardtop, BodyStylehatchback, BodyStylesedan, BodyStylewagon. This is because

BodyStyle is a qualitative variable, which can take the values “hardtop,” “hatchback,” “sedan,”

“wagon,” and “convertible.” R is replacing the qualitative variable “BodyStyle” with four columns

(BodyStylehardtop, BodyStylehatchback, BodyStylesedan, BodyStylewagon), which are 0-1 columns.

For example, BodyStylehardtop would be 1 if the car’s BodyStyle is hardtop and 0 otherwise.

BodyStylehatchback would be 1 if the car’s BodysStyle is hatchback, and 0 otherwise. This is similar

to what we did in Toyota Used Car example in class (replacing the Fuel Type column with CNGFuel

and DieselFuel columns, which were 0-1 columns), but R is automating it for you here when you run

regsubsets().

b) Focusing on the best model with 8 predictors, answer the following True / False questions:

i Whether or not a car has four doors is a predictor in this model.

ii Whether or not a car’s body style is hard top is a predictor in this model.iii Whether or not a car’s drive wheels is forward is a predictor in this model.

c) Repeat part (a), but this time using the forward stepwise selection method

d) Using the forward stepwise selection method and allowing up to 15 predictors, what is the best

model according to Cp criterion? State the predictors in the best model and their coefficients.

Comment on predictors: What types of cars tend to have higher losses? What types of cars tend to

have lower losses?

Hint: In part (c), you used regsubsets() with forward selection to determine the best model with k

predictors for k = 1, 2, …, 15. Now, you need to compare the Cp of the best model with 1 predictor

versus the Cp of the best model with 2 predictors versus …. the Cp of the best model with 15

predictors, to determine the number of predictors that minimizes Cp. For guidance, check how we

accomplished the same goal in Task 6 of Hitters – Subset Selection example.

Question 2)

Suppose we have a data set, which has p predictors (input variables), and we perform model selection

using (i) best subset selection (BSS), (ii) forward stepwise selection (FwSS), and (iii) backward stepwise

selection (BwSS). Specifically, using each of these three approaches, we determine the best model with

k predictors for all possible values of k, that is, k = 1, 2, …, p. Vehicle characteristics

Answer the following True or False questions:

I. The predictors in the k-predictor model identified by FwSS are a subset of predictors in the

(k+1)-predictor model identified by FwSS.

II. The predictors in the k-predictor model identified by BwSS are a subset of predictors in the

(k+1)-predictor model identified by BwSS.

III. The predictors in the k-predictor model identified by BSS are a subset of predictors in the (k+1)-

predictor model identified by BSS.

IV. The predictors in the k-predictor model identified by BwSS are a subset of predictors in the

(k+1)-predictor model identified by FwSS. Vehicle characteristics

V. The predictors in the k-predictor model identified by FwSS are a subset of predictors in the

(k+1)-predictor model identified by BwSS.

**Are you overwhelmed by your class schedule and need help completing this assignment? You deserve the best professional and plagiarism-free writing services. Allow us to take the weight off your shoulders by clicking this button.**

Get help