BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MadQuidd
Obsidian | Level 7

I am trying to fit a regression using indicator variables. its not working and i dont understand why. 

 

the code i used is as follows:

 

data indicatorVariableNCBirth;
set NCBirth;
if momrace = 'white' then white=4;
else white = 0;
if momrace = 'hispanic' then hispanic=3;
else hispanic = 0;
if momrace = 'black' then black = 2;
else black = 0;
if momrace = 'other' then other=1;
else other=0;
run;
Proc print data=indicatorvariablencbirth;
run;


proc reg data=indicatorvariablencbirth;
model Birthweightoz = momrace;
run;

 

 

here is the error message i receive after trying to carry out the regression model:

 

ERROR: Variable MomRace in list does not match type prescribed for this list.
NOTE: The previous statement has been deleted.
58 run;
 
WARNING: No variables specified for an SSCP matrix. Execution terminating.
NOTE: PROCEDURE REG used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds
 
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

See my comments in red on your code.

 

data indicatorVariableNCBirth; 
set NCBirth;

When creating indicator variables, it's best to use 1/0, not 5/0. Change these to 1/0 binary coding.

Additionally, if your categorical variable has N levels, you need N-1 Indicator variables to represent the variable. Including N is know as overparameterization.
if momrace = 'white' then white=4;
else white = 0;
if momrace = 'hispanic' then hispanic=3;
else hispanic = 0;
if momrace = 'black' then black = 2;
else black = 0;
if momrace = 'other' then other=1;
else other=0;


run;

 

 


proc reg data=indicatorvariablencbirth;
model Birthweightoz = momrace; <-Change this to include N-1 of your indicator variables that are code 0/1. Then you'll get estimates. 
run;

 

Give your data structure I would also look at boxplots for the weight by race to visualize the comparison.

View solution in original post

14 REPLIES 14
Reeza
Super User

Why are you modeling the variable that you've created indicators for? Should you be using the new indicator variables instead?

 

Your first piece of code is entirely separate from your second. They don't reference the same data set or connected in any way.

 

 

MadQuidd
Obsidian | Level 7

I created a sub set of the original data set to include the indicator variables. When i fit the regression model to the created indicator variables it doesnt work. I dont know what I am doing wrong but I know that my output either gives me errors or it creates an off looking output statement. I

I used the following code and got a weird output that is wrong:

proc reg data=indicatorvariablencbirth;
model Birthweightoz = white hispanic black other;
run;

 

Reeza
Super User

Explain how your data is structured, ideally provide sample data.

Then show what your model should be mathematically and we can help with the code.

ballardw
Super User

What type of output are you expecting to get?

 

 

 

 

MadQuidd
Obsidian | Level 7

Thats my problem I am not entirely sure what the final regression line is meant to look like but the output data I am getting are straiht lines. For all I know it could be correct but  i am getting vertical lines. 

Im trying to attach the write up of the output to help explain my confusion. 

 

 

Reeza
Super User

Whats your basic model?

 

 

Birthweight = B1*white + B2*asian + B3*other

MadQuidd
Obsidian | Level 7

I honestly dont know what you mean when you say basic model. But here is a protion of the original data that might help answering my question because I dont know what I am doing wrong. 

I really do appreciate all of the help in tying to figure this out thank you.

 

 

below is the code  i used to import the csv data&colon;

 

FILENAME CSV "/folders/myfolders/3064data/NCbirths_RaceStudy.csv" TERMSTR=CRLF;


PROC IMPORT DATAFILE=CSV
OUT=NCBirth
DBMS=CSV
REPLACE;
RUN;

/** Print the results. **/

PROC PRINT DATA=NCBirth (obs=100); RUN;

Reeza
Super User

What question are you trying to answer? Do you have a hypothesis?

Are you familiar with linear regression?

 

Data is great, but you have to know what you want out of it as well.

MadQuidd
Obsidian | Level 7

All i want to do is produce a parameters estimates output table from which I can gather more information on the data. 

I want to see if i can accurately use 'momrace' to predict birthweights while using the indicator variables. 

 

ballardw
Super User

The specific error you are receiving because the varaible MOMRACE is character as evidenced by your code:

if momrace = 'white' then white=4;
else white = 0;

 

Prog Reg requires the variables on the model statement to be numeric.

Reeza
Super User

See my comments in red on your code.

 

data indicatorVariableNCBirth; 
set NCBirth;

When creating indicator variables, it's best to use 1/0, not 5/0. Change these to 1/0 binary coding.

Additionally, if your categorical variable has N levels, you need N-1 Indicator variables to represent the variable. Including N is know as overparameterization.
if momrace = 'white' then white=4;
else white = 0;
if momrace = 'hispanic' then hispanic=3;
else hispanic = 0;
if momrace = 'black' then black = 2;
else black = 0;
if momrace = 'other' then other=1;
else other=0;


run;

 

 


proc reg data=indicatorvariablencbirth;
model Birthweightoz = momrace; <-Change this to include N-1 of your indicator variables that are code 0/1. Then you'll get estimates. 
run;

 

Give your data structure I would also look at boxplots for the weight by race to visualize the comparison.

MadQuidd
Obsidian | Level 7

What do you mean when you say N-1 statement. Do mean add an additional variable to include the number of races minus one?

I dont really know what you mean by adding that statement or how to do that. 

I don understand taking the 0/5 out of the equation having a simple binary is more appropriate.

 

Reeza
Super User

Perhaps reading some linear regression tutorials would be helpful.

http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter3/sasreg3.htm

 

As well, the SAS Statistical e-course which covers linear regression is free 

MadQuidd
Obsidian | Level 7

I figured it out from your pervious post! Thank you so much for the help I wa truely lost and really needed it.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 3031 views
  • 2 likes
  • 3 in conversation