Programming the statistical procedures from SAS

Trying to fit regression using indicator variables

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 18
Accepted Solution

Trying to fit regression using indicator variables

I am trying to fit a regression using indicator variables. its not working and i dont understand why. 

 

the code i used is as follows:

 

data indicatorVariableNCBirth;
set NCBirth;
if momrace = 'white' then white=4;
else white = 0;
if momrace = 'hispanic' then hispanic=3;
else hispanic = 0;
if momrace = 'black' then black = 2;
else black = 0;
if momrace = 'other' then other=1;
else other=0;
run;
Proc print data=indicatorvariablencbirth;
run;


proc reg data=indicatorvariablencbirth;
model Birthweightoz = momrace;
run;

 

 

here is the error message i receive after trying to carry out the regression model:

 

ERROR: Variable MomRace in list does not match type prescribed for this list.
NOTE: The previous statement has been deleted.
58 run;
 
WARNING: No variables specified for an SSCP matrix. Execution terminating.
NOTE: PROCEDURE REG used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds
 

Accepted Solutions
Solution
‎02-19-2016 03:12 PM
Super User
Posts: 18,589

Re: Trying to fit regression using indicator variables

See my comments in red on your code.

 

data indicatorVariableNCBirth; 
set NCBirth;

When creating indicator variables, it's best to use 1/0, not 5/0. Change these to 1/0 binary coding.

Additionally, if your categorical variable has N levels, you need N-1 Indicator variables to represent the variable. Including N is know as overparameterization.
if momrace = 'white' then white=4;
else white = 0;
if momrace = 'hispanic' then hispanic=3;
else hispanic = 0;
if momrace = 'black' then black = 2;
else black = 0;
if momrace = 'other' then other=1;
else other=0;


run;

 

 


proc reg data=indicatorvariablencbirth;
model Birthweightoz = momrace; <-Change this to include N-1 of your indicator variables that are code 0/1. Then you'll get estimates. 
run;

 

Give your data structure I would also look at boxplots for the weight by race to visualize the comparison.

View solution in original post


All Replies
Super User
Posts: 18,589

Re: Trying to fit regression using indicator variables

Why are you modeling the variable that you've created indicators for? Should you be using the new indicator variables instead?

 

Your first piece of code is entirely separate from your second. They don't reference the same data set or connected in any way.

 

 

Occasional Contributor
Posts: 18

Re: Trying to fit regression using indicator variables

I created a sub set of the original data set to include the indicator variables. When i fit the regression model to the created indicator variables it doesnt work. I dont know what I am doing wrong but I know that my output either gives me errors or it creates an off looking output statement. I

I used the following code and got a weird output that is wrong:

proc reg data=indicatorvariablencbirth;
model Birthweightoz = white hispanic black other;
run;

 

Super User
Posts: 18,589

Re: Trying to fit regression using indicator variables

Explain how your data is structured, ideally provide sample data.

Then show what your model should be mathematically and we can help with the code.

Super User
Posts: 10,875

Re: Trying to fit regression using indicator variables

What type of output are you expecting to get?

 

 

 

 

Occasional Contributor
Posts: 18

Re: Trying to fit regression using indicator variables

Thats my problem I am not entirely sure what the final regression line is meant to look like but the output data I am getting are straiht lines. For all I know it could be correct but  i am getting vertical lines. 

Im trying to attach the write up of the output to help explain my confusion. 

 

 

Super User
Posts: 18,589

Re: Trying to fit regression using indicator variables

Whats your basic model?

 

 

Birthweight = B1*white + B2*asian + B3*other

Occasional Contributor
Posts: 18

Re: Trying to fit regression using indicator variables

I honestly dont know what you mean when you say basic model. But here is a protion of the original data that might help answering my question because I dont know what I am doing wrong. 

I really do appreciate all of the help in tying to figure this out thank you.

 

 

below is the code  i used to import the csv data&colon;

 

FILENAME CSV "/folders/myfolders/3064data/NCbirths_RaceStudy.csv" TERMSTR=CRLF;


PROC IMPORT DATAFILE=CSV
OUT=NCBirth
DBMS=CSV
REPLACE;
RUN;

/** Print the results. **/

PROC PRINT DATA=NCBirth (obs=100); RUN;

Super User
Posts: 18,589

Re: Trying to fit regression using indicator variables

What question are you trying to answer? Do you have a hypothesis?

Are you familiar with linear regression?

 

Data is great, but you have to know what you want out of it as well.

Occasional Contributor
Posts: 18

Re: Trying to fit regression using indicator variables

All i want to do is produce a parameters estimates output table from which I can gather more information on the data. 

I want to see if i can accurately use 'momrace' to predict birthweights while using the indicator variables. 

 

Super User
Posts: 10,875

Re: Trying to fit regression using indicator variables

The specific error you are receiving because the varaible MOMRACE is character as evidenced by your code:

if momrace = 'white' then white=4;
else white = 0;

 

Prog Reg requires the variables on the model statement to be numeric.

Solution
‎02-19-2016 03:12 PM
Super User
Posts: 18,589

Re: Trying to fit regression using indicator variables

See my comments in red on your code.

 

data indicatorVariableNCBirth; 
set NCBirth;

When creating indicator variables, it's best to use 1/0, not 5/0. Change these to 1/0 binary coding.

Additionally, if your categorical variable has N levels, you need N-1 Indicator variables to represent the variable. Including N is know as overparameterization.
if momrace = 'white' then white=4;
else white = 0;
if momrace = 'hispanic' then hispanic=3;
else hispanic = 0;
if momrace = 'black' then black = 2;
else black = 0;
if momrace = 'other' then other=1;
else other=0;


run;

 

 


proc reg data=indicatorvariablencbirth;
model Birthweightoz = momrace; <-Change this to include N-1 of your indicator variables that are code 0/1. Then you'll get estimates. 
run;

 

Give your data structure I would also look at boxplots for the weight by race to visualize the comparison.

Occasional Contributor
Posts: 18

Re: Trying to fit regression using indicator variables

What do you mean when you say N-1 statement. Do mean add an additional variable to include the number of races minus one?

I dont really know what you mean by adding that statement or how to do that. 

I don understand taking the 0/5 out of the equation having a simple binary is more appropriate.

 

Super User
Posts: 18,589

Re: Trying to fit regression using indicator variables

Perhaps reading some linear regression tutorials would be helpful.

http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter3/sasreg3.htm

 

As well, the SAS Statistical e-course which covers linear regression is free 

Occasional Contributor
Posts: 18

Re: Trying to fit regression using indicator variables

I figured it out from your pervious post! Thank you so much for the help I wa truely lost and really needed it.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 770 views
  • 2 likes
  • 3 in conversation