Help using Base SAS procedures

How can i create dummy variables? How can i change my code?

Accepted Solution Solved
Reply
Contributor
Posts: 21
Accepted Solution

How can i create dummy variables? How can i change my code?

Hi. I have 4 explanatory variables and 3 of them are categorical, only one of them are numerical variable.

I know if i want to use PROC REG or PROC SGSCATTER  function, i need to change my numerical variables into categorical.

This is my code:


data forreg; set new; if Stage1 = A Then StageA=1 ; Else StageA=0; if Stage1 = B Then StageB=1 ; Else StageB=0; if Stage1 = C Then StageC=1; Else StageC=0; run;

i tried only for Stage1 as an example. But sadly, my code is not working :/

 

Here's explanation about my variables:

My response variable is 'Exam' (out of 100)

and 

'Assign' : the student’s Assignment mark (out of 20)

'Stage1': the student’s grade at Stage 1 Statistics: A, B or C

'Grade': What grade the student obtained: A, B, C or D

'Pass'Smiley Very Happyid the student pass the course or not: No or Yes

 

And here's my error:

data forreg;
37 set new;
38 if Stage1 = A Then StageA=1 ; Else StageA=0;
39 if Stage1 = B Then StageB=1 ; Else StageB=0;
40 if Stage1 = C Then StageC=1; Else StageC=0;
41 run;

NOTE: Character values have been converted to numeric values at the places given by: (Line)Smiley SadColumn).
39:6 40:6
NOTE: Variable A is uninitialized.
NOTE: Invalid numeric data, STAGE1='C' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='C' , at line 40 column 6.
GRADE=D PASS=No EXAM=35 DEGREE=Other GENDER=Male ATTEND=No ASSIGN=8 TEST=8.2 B=4 C=1 MC=15 COLOUR=Blue STAGE1=C YEARSSIN=3
REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=1
NOTE: Invalid numeric data, STAGE1='C' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='C' , at line 40 column 6.
GRADE=D PASS=No EXAM=25 DEGREE=BA GENDER=Male ATTEND=No ASSIGN=8.8 TEST=10.9 B=3 C=0 MC=11 COLOUR=Yellow STAGE1=C
YEARSSIN=1.5 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=2
NOTE: Invalid numeric data, STAGE1='B' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='B' , at line 40 column 6.
GRADE=C PASS=Yes EXAM=52 DEGREE=Other GENDER=Female ATTEND=No ASSIGN=12 TEST=15.5 B=8 C=14 MC=15 COLOUR=Yellow STAGE1=B
YEARSSIN=0 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=3
NOTE: Invalid numeric data, STAGE1='A' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='A' , at line 40 column 6.
GRADE=A PASS=Yes EXAM=81 DEGREE=BCom GENDER=Male ATTEND=No ASSIGN=16.8 TEST=17.3 B=13 C=16 MC=26 COLOUR=Yellow STAGE1=A
YEARSSIN=2.5 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=4
NOTE: Invalid numeric data, STAGE1='A' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='A' , at line 40 column 6.
GRADE=A PASS=Yes EXAM=77 DEGREE=BSc GENDER=Female ATTEND=No ASSIGN=16.4 TEST=12.7 B=15 C=18 MC=22 COLOUR=Green STAGE1=A
YEARSSIN=0.5 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=5
NOTE: Invalid numeric data, STAGE1='C' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='C' , at line 40 column 6.
GRADE=D PASS=No EXAM=32 DEGREE=BSc GENDER=Female ATTEND=No ASSIGN=8 TEST=10 B=8 C=4 MC=10 COLOUR=Blue STAGE1=C YEARSSIN=1
REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=6
NOTE: Invalid numeric data, STAGE1='C' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='C' , at line 40 column 6.
GRADE=C PASS=Yes EXAM=46 DEGREE=BSc GENDER=Female ATTEND=No ASSIGN=10.8 TEST=12.7 B=14 C=10 MC=11 COLOUR=Blue STAGE1=C
YEARSSIN=0.5 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=7
NOTE: Invalid numeric data, STAGE1='A' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='A' , at line 40 column 6.
GRADE=A PASS=Yes EXAM=89 DEGREE=BSc GENDER=Female ATTEND=No ASSIGN=20 TEST=18.2 B=16 C=15 MC=29 COLOUR=Yellow STAGE1=A
YEARSSIN=1 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=8
NOTE: Invalid numeric data, STAGE1='C' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='C' , at line 40 column 6.
GRADE=D PASS=No EXAM=36 DEGREE=BSc GENDER=Female ATTEND=No ASSIGN=12.8 TEST=12.7 B=5 C=7 MC=12 COLOUR=Blue STAGE1=C
YEARSSIN=2 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=9
NOTE: Invalid numeric data, STAGE1='B' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='B' , at line 40 column 6.
GRADE=B PASS=Yes EXAM=57 DEGREE=BCom GENDER=Female ATTEND=No ASSIGN=17.6 TEST=14.5 B=10 C=13 MC=17 COLOUR=Green STAGE1=B
YEARSSIN=2 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=10
NOTE: Invalid numeric data, STAGE1='C' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='C' , at line 40 column 6.
GRADE=D PASS=No EXAM=35 DEGREE=BSc GENDER=Male ATTEND=No ASSIGN=14 TEST=4.5 B=3 C=10 MC=11 COLOUR=Green STAGE1=C YEARSSIN=1
REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=11
NOTE: Invalid numeric data, STAGE1='B' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='B' , at line 40 column 6.
GRADE=C PASS=Yes EXAM=44 DEGREE=BSc GENDER=Female ATTEND=No ASSIGN=14.8 TEST=9.1 B=8 C=10 MC=13 COLOUR=Green STAGE1=B
YEARSSIN=1.5 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=12
NOTE: Invalid numeric data, STAGE1='B' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='B' , at line 40 column 6.
GRADE=C PASS=Yes EXAM=52 DEGREE=BSc GENDER=Female ATTEND=No ASSIGN=14.4 TEST=6.4 B=7 C=15 MC=15 COLOUR=Pink STAGE1=B
YEARSSIN=1.5 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=13
NOTE: Invalid numeric data, STAGE1='C' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='C' , at line 40 column 6.
GRADE=D PASS=No EXAM=32 DEGREE=BA GENDER=Female ATTEND=No ASSIGN=9.6 TEST=13.6 B=4 C=12 MC=8 COLOUR=Green STAGE1=C
YEARSSIN=2.5 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=14
NOTE: Invalid numeric data, STAGE1='C' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='C' , at line 40 column 6.
GRADE=C PASS=Yes EXAM=42 DEGREE=BSc GENDER=Male ATTEND=No ASSIGN=11.2 TEST=13.6 B=4 C=10 MC=14 COLOUR=Pink STAGE1=C
YEARSSIN=0.5 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=15
NOTE: Invalid numeric data, STAGE1='B' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='B' , at line 40 column 6.
GRADE=C PASS=Yes EXAM=54 DEGREE=BSc GENDER=Female ATTEND=No ASSIGN=14.8 TEST=10.9 B=9 C=13 MC=16 COLOUR=Blue STAGE1=B
YEARSSIN=3 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=16
NOTE: Invalid numeric data, STAGE1='A' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='A' , at line 40 column 6.
GRADE=A PASS=Yes EXAM=72 DEGREE=BCom GENDER=Male ATTEND=No ASSIGN=17.2 TEST=15.5 B=11 C=17 MC=22 COLOUR=Yellow STAGE1=A
YEARSSIN=2 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=17
NOTE: Invalid numeric data, STAGE1='A' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='A' , at line 40 column 6.
GRADE=A PASS=Yes EXAM=72 DEGREE=Other GENDER=Male ATTEND=No ASSIGN=16.4 TEST=16.4 B=15 C=9 MC=24 COLOUR=Yellow STAGE1=A
YEARSSIN=1 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=18
NOTE: Invalid numeric data, STAGE1='C' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='C' , at line 40 column 6.
GRADE=D PASS=No EXAM=41 DEGREE=BSc GENDER=Male ATTEND=No ASSIGN=5.2 TEST=4.5 B=9 C=4 MC=14 COLOUR=Green STAGE1=C YEARSSIN=4.5
REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=19
NOTE: Invalid numeric data, STAGE1='A' , at line 39 column 6.
NOTE: Invalid numeric data, STAGE1='A' , at line 40 column 6.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
GRADE=D PASS=No EXAM=24 DEGREE=BSc GENDER=Male ATTEND=No ASSIGN=2.8 TEST=14.5 B=0 C=0 MC=12 COLOUR=Yellow STAGE1=A
YEARSSIN=1.5 REPEAT=No A= StageA=0 StageB=0 StageC=0 _ERROR_=1 _N_=20
NOTE: There were 146 observations read from the data set WORK.NEW.
NOTE: The data set WORK.FORREG has 146 observations and 19 variables.
NOTE: DATA statement used (Total process time):
real time 0.20 seconds
cpu time 0.18 seconds

 

Would you help me kindly please?

 

I also attached the original file of image.캡처.JPG

 

 


Accepted Solutions
Solution
‎10-15-2017 04:14 PM
Super User
Posts: 6,785

Re: How can i create dummy variables? How can i change my code?

In terms of your syntax, you start out by incorrectly omitting quotes:

 

if Stage1=A then StageA=1; else StageA=0;

 

The right way:

 

if Stage1='A' then StageA=1; else StageA=0;

 

In terms of your plan, the other comments you have received are right on the money.  It is rare that you need to create your own dummy variables.  Most regression procedures support a CLASS statement, which creates dummy variables as needed.  That also avoids the mistake of overspecifying.  For example, if your CLASS variable can take on values of "A", "B", or "C" only, your analysis  should specify just two dummy variables, not three.

View solution in original post


All Replies
Esteemed Advisor
Posts: 5,543

Re: How can i create dummy variables? How can i change my code?

SAS Super FREQ
Posts: 503

Re: How can i create dummy variables? How can i change my code?

https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_transreg_se...

 

proc transreg with the DESIGN option gives you control over the names, labels, and coding.  GLIMMIX has options as well.  GLMMOD was designed to give you the GLM internal coding, so it does not give you much control.

Solution
‎10-15-2017 04:14 PM
Super User
Posts: 6,785

Re: How can i create dummy variables? How can i change my code?

In terms of your syntax, you start out by incorrectly omitting quotes:

 

if Stage1=A then StageA=1; else StageA=0;

 

The right way:

 

if Stage1='A' then StageA=1; else StageA=0;

 

In terms of your plan, the other comments you have received are right on the money.  It is rare that you need to create your own dummy variables.  Most regression procedures support a CLASS statement, which creates dummy variables as needed.  That also avoids the mistake of overspecifying.  For example, if your CLASS variable can take on values of "A", "B", or "C" only, your analysis  should specify just two dummy variables, not three.

SAS Super FREQ
Posts: 503

Re: How can i create dummy variables? How can i change my code?

[ Edited ]
Posted in reply to Astounding

I must admit my bias.  I wrote transreg.  For that reason and others, I think glmmod is never the answer to questions about how to code.  It is the answer to the question: how do I get the coded variables that glm creates? glmmod was never designed for general purpose coding.  In modeling, people often want to code in different ways.  That is why I added so many options to transreg.  Later, that is why so many options were added to the CLASS and EFFECT statements in other procedures. 

 

Creating three binary variables for a CLASS variable with three values is not a mistake.  It is an option. It is the way you specify a cell means model, which has an implicit intercept  It is not the way you specify a reference cell model. It is how you specify a separate slopes and intercepts model. It is not the way to specify a model with an intercept, slope, changes in intercepts, and changes in slopes.

 

In GLM, the less than full rank model was a way that could handle different patterns of BY groups, missing data, invalid weights and freqs, and anything else that was thrown at it.  One binary variable gets created for each level, and a sequential sweep decides which ones are needed. Again, it was not a mistake. @Astounding is correct though: pick a model and a procedure.  Don't use a DATA step.  If you do use a DATA step, do it the easy way.  There is no reason to have an IF, ELSE, and two assignment statements for each level.  Variable = Boolean-expression; makes a binary variable:

 

StageA = Stage1='A';

SAS Super FREQ
Posts: 4,247

Re: How can i create dummy variables? How can i change my code?

Posted in reply to WarrenKuhfeld

For an overview of creating dummy variables in SAS, see "Four ways to create a design matrix in SAS."  As others have said, however, many regression procedures support a CLASS statement, so you might not need to create dummy variables explicitly.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 1607 views
  • 5 likes
  • 5 in conversation