turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Trying to fit regression using indicator variables

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 10:52 AM

I am trying to fit a regression using indicator variables. its not working and i dont understand why.

the code i used is as follows:

data indicatorVariableNCBirth;

set NCBirth;

if momrace = 'white' then white=4;

else white = 0;

if momrace = 'hispanic' then hispanic=3;

else hispanic = 0;

if momrace = 'black' then black = 2;

else black = 0;

if momrace = 'other' then other=1;

else other=0;

run;

Proc print data=indicatorvariablencbirth;

run;

proc reg data=indicatorvariablencbirth;

model Birthweightoz = momrace;

run;

here is the error message i receive after trying to carry out the regression model:

ERROR: Variable MomRace in list does not match type prescribed for this list.

NOTE: The previous statement has been deleted.

58 run;

WARNING: No variables specified for an SSCP matrix. Execution terminating.

NOTE: PROCEDURE REG used (Total process time):

real time 0.04 seconds

cpu time 0.04 seconds

Accepted Solutions

Solution

02-19-2016
03:12 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 02:02 PM

See my comments in red on your code.

data **indicatorVariableNCBirth; **

set NCBirth;

**When creating indicator variables, it's best to use 1/0, not 5/0. Change these to 1/0 binary coding.**

**Additionally, if your categorical variable has N levels, you need N-1 Indicator variables to represent the variable. Including N is know as overparameterization.**

if momrace = 'white' then white=4;

else white = 0;

if momrace = 'hispanic' then hispanic=3;

else hispanic = 0;

if momrace = 'black' then black = 2;

else black = 0;

if momrace = 'other' then other=1;

else other=0;

run;

proc reg data=indicatorvariablencbirth;

model Birthweightoz = momrace; <-**Change this to include N-1 of your indicator variables that are code 0/1. Then you'll get estimates. **

run;

**Give your data structure I would also look at boxplots for the weight by race to visualize the comparison.**

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 11:02 AM

Why are you modeling the variable that you've created indicators for? Should you be using the new indicator variables instead?

Your first piece of code is entirely separate from your second. They don't reference the same data set or connected in any way.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 11:15 AM

I created a sub set of the original data set to include the indicator variables. When i fit the regression model to the created indicator variables it doesnt work. I dont know what I am doing wrong but I know that my output either gives me errors or it creates an off looking output statement. I

I used the following code and got a weird output that is wrong:

proc reg data=indicatorvariablencbirth;

model Birthweightoz = white hispanic black other;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 11:19 AM

Explain how your data is structured, ideally provide sample data.

Then show what your model should be mathematically and we can help with the code.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 12:47 PM

What type of output are you expecting to get?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 12:56 PM

Thats my problem I am not entirely sure what the final regression line is meant to look like but the output data I am getting are straiht lines. For all I know it could be correct but i am getting vertical lines.

Im trying to attach the write up of the output to help explain my confusion.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 01:15 PM

Whats your basic model?

Birthweight = B1*white + B2*asian + B3*other

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 01:24 PM

I honestly dont know what you mean when you say basic model. But here is a protion of the original data that might help answering my question because I dont know what I am doing wrong.

I really do appreciate all of the help in tying to figure this out thank you.

below is the code i used to import the csv data:

FILENAME CSV "/folders/myfolders/3064data/NCbirths_RaceStudy.csv" TERMSTR=CRLF;

PROC IMPORT DATAFILE=CSV

OUT=NCBirth

DBMS=CSV

REPLACE;

RUN;

/** Print the results. **/

PROC PRINT DATA=NCBirth (obs=100); RUN;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 01:26 PM

What question are you trying to answer? Do you have a hypothesis?

Are you familiar with linear regression?

Data is great, but you have to know what you want out of it as well.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 01:41 PM

All i want to do is produce a parameters estimates output table from which I can gather more information on the data.

I want to see if i can accurately use 'momrace' to predict birthweights while using the indicator variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 11:18 AM

The specific error you are receiving because the varaible MOMRACE is character as evidenced by your code:

if momrace = 'white' then white=4;

else white = 0;

Prog Reg requires the variables on the model statement to be numeric.

Solution

02-19-2016
03:12 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 02:02 PM

See my comments in red on your code.

data **indicatorVariableNCBirth; **

set NCBirth;

**When creating indicator variables, it's best to use 1/0, not 5/0. Change these to 1/0 binary coding.**

**Additionally, if your categorical variable has N levels, you need N-1 Indicator variables to represent the variable. Including N is know as overparameterization.**

if momrace = 'white' then white=4;

else white = 0;

if momrace = 'hispanic' then hispanic=3;

else hispanic = 0;

if momrace = 'black' then black = 2;

else black = 0;

if momrace = 'other' then other=1;

else other=0;

run;

proc reg data=indicatorvariablencbirth;

model Birthweightoz = momrace; <-**Change this to include N-1 of your indicator variables that are code 0/1. Then you'll get estimates. **

run;

**Give your data structure I would also look at boxplots for the weight by race to visualize the comparison.**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 02:14 PM

What do you mean when you say N-1 statement. Do mean add an additional variable to include the number of races minus one?

I dont really know what you mean by adding that statement or how to do that.

I don understand taking the 0/5 out of the equation having a simple binary is more appropriate.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 03:11 PM

Perhaps reading some linear regression tutorials would be helpful.

http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter3/sasreg3.htm

As well, the SAS Statistical e-course which covers linear regression is free

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 03:16 PM

I figured it out from your pervious post! Thank you so much for the help I wa truely lost and really needed it.