Questions Regarding Dummy Variables

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 13
Accepted Solution

Questions Regarding Dummy Variables

I was working on a problem that involved creating dummy variables, but I ran into an issue where I'm having missing values for the dummy variables in the corresponding reference category even though the dataset doesn't have missing values. Even if I'm selecting one of the categories to be the reference category or variable, shouldn't the dummy variable values be zero? I had the same issue even when I did not account for missing values. I've included my code, log, output, and the content of the text file for context and so that my question will be clearer.

 

The part of the homework assignment that I'm having issues with is the following:

Fibromyalgia is a syndrome of widespread body pain that is often treated by rheumatologists. One way of measuring the impact of fibromyalgia on patients is the Fibromyalgia Impact Questionnaire (FIQ). On the FIQ, high values show greater impact of disease (bad) and low values show lesser impact of disease (good). We have data on women with fibromyalgia who attended one of two types of disease self-management classes or who received standard care (the control group).

 

Data from this study are in the file fibr03_sum18.txt on the BS 805 web site in the Assignments section for Class 6. The variables in the data file are:

 

FIQ score (3.1 format) taken after the classes Group (1 = class 1, 2 = class 2, 3 = standard care) Disease Severity (On a scale of 1 to 6) before the classes Age (years) Since the data were entered into this file, information on a new patient and a correction to the data have been found. The new patient is in the control group, has FIQ = 8.2, Disease Severity =2, and Age = 25 years. The correction is that the second subject in class 1 was 17 rather than 18 years old.

 

A) Create a temporary SAS data set using these data. In the data set, create a set of indicator variables that code for group membership. Use PROC PRINT to list the data.

 

I read in the text file using column input, but I think it can be read in using list input as well? The text file contained the data below was the file was called: fibr03_sum18.txt.

3.1 1 6 21
1.8 1 6 18
3.3 1 5 22
2.9 1 4 15
4.3 1 3 24
4.8 1 3 22
4.9 1 2 17
6.4 1 2 18
5.7 2 5 17
6.1 2 5 25
8.5 2 3 31
7.1 2 2 17
7.7 2 1 25
9.8 2 1 22
5.1 3 4 23
7.2 3 1 15
8.3 3 1 22
6.7 3 2 20

My code for reading in the data and creating the temporary dataset with the dummy variables was:

*Part A: Reading in Data and Creating a Temporary Dataset; 
libname HW6 'C:\Users\jackz\Desktop\SAS';
filename HW6new 'C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt';
proc format;
    value grpf 1='class 1' 2='class 2' 3='standard care';
run; 
data one; 
    infile HW6new;
    input @1 FIQ 3.1 @5 grp 1. @7 disev 1. @9 age 2.;
*Creating Dummy Variables;
    if grp=1 then classc1=1; else if grp=2 then classc1=0;
    if grp=2 then classc2=1; else if grp=1 then classc2=0;
    if grp=. then classc1=.;
    if grp=. then classc2=.;
    label FIQ='FIQ Score'
    grp='Group'
    disev='Disease Severity'
    age='Age';
    format grp grpf.;
    run; 
*Printout of Dataset one;
proc print data=one label; 
run; 

My log for this code was:

NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M5)
      Licensed to BOSTON UNIVERSITY - SFA T&R, Site 70009029.
NOTE: This session is executing on the W32_10HOME  platform.



NOTE: Updated analytical products:

      SAS/STAT 14.3
      SAS/ETS 14.3
      SAS/OR 14.3
      SAS/IML 14.3
      SAS/QC 14.3

NOTE: Additional host information:

 W32_10HOME WIN 10.0.16299  Workstation

NOTE: SAS initialization used:
      real time           0.96 seconds
      cpu time            0.95 seconds

1    *Part A: Reading in Data and Creating a Temporary Dataset;
2    libname HW6 'C:\Users\jackz\Desktop\SAS';
NOTE: Libref HW6 was successfully assigned as follows:
      Engine:        V9
      Physical Name: C:\Users\jackz\Desktop\SAS
3    filename HW6new 'C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt';
4    proc format;
5        value grpf 1='class 1' 2='class 2' 3='standard care';
NOTE: Format GRPF has been output.
6    run;

NOTE: PROCEDURE FORMAT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


7    data one;
8        infile HW6new;
9        input @1 FIQ 3.1 @5 grp 1. @7 disev 1. @9 age 2.;
10   *Creating Dummy Variables;
11       if grp=1 then classc1=1; else if grp=2 then classc1=0;
12       if grp=2 then classc2=1; else if grp=1 then classc2=0;
13       if grp=. then classc1=.;
14       if grp=. then classc2=.;
15       label FIQ='FIQ Score'
16       grp='Group'
17       disev='Disease Severity'
18       age='Age';
19       format grp grpf.;
20       run;

NOTE: The infile HW6NEW is:
      Filename=C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=214,
      Last Modified=15Jun2018:12:56:26,
      Create Time=15Jun2018:12:56:26

NOTE: 18 records were read from the infile HW6NEW.
      The minimum record length was 10.
      The maximum record length was 10.
NOTE: The data set WORK.ONE has 18 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


21   *Printout of Dataset one;
22   proc print data=one label;
NOTE: Writing HTML Body file: sashtml.htm
23   run;

NOTE: There were 18 observations read from the data set WORK.ONE.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.27 seconds
      cpu time            0.06 seconds

I have put the output in the document attached. I am also including it here, although it is not lined up: 



The SAS System 


Obs FIQ Score Group Disease
Severity Age classc1 classc2 
1 3.1 class 1 6 21 1 0 
2 1.8 class 1 6 18 1 0 
3 3.3 class 1 5 22 1 0 
4 2.9 class 1 4 15 1 0 
5 4.3 class 1 3 24 1 0 
6 4.8 class 1 3 22 1 0 
7 4.9 class 1 2 17 1 0 
8 6.4 class 1 2 18 1 0 
9 5.7 class 2 5 17 0 1 
10 6.1 class 2 5 25 0 1 
11 8.5 class 2 3 31 0 1 
12 7.1 class 2 2 17 0 1 
13 7.7 class 2 1 25 0 1 
14 9.8 class 2 1 22 0 1 
15 5.1 standard care 4 23 . . 
16 7.2 standard care 1 15 . . 
17 8.3 standard care 1 22 . . 
18 6.7 standard care 2 20 . . 

You can see that there are missing values for the dummy variables classc1 and classc2 even though there are no missing values in the original dataset. Should those values read 0, since group 3 does not fall in either grp=1 or grp=2?

 

Can anyone give me any hints as to what I have done wrong, if I have done anything wrong? Thanks for all of your help!

 


Accepted Solutions
Solution
‎06-17-2018 12:28 PM
Respected Advisor
Posts: 3,018

Re: Questions Regarding Dummy Variables


@Reeza wrote:

 

 if grp=1 then classc1=1; else if grp=2 then classc1=0;
    if grp=2 then classc2=1; else if grp=1 then classc2=0;

 

These lines are you issue because if grp=3 then neither condition is true and classc1 and classc2 stay at their previous values, which in this case is missing. You can modify it as others have suggested by apply a generic else, but I prefer to set it 0 at the top of the code unless you don't need that. If you have missing you need to account for that though.

 

 

classc1=0;
classc2=0;

if grp=1 then classc1=1;
else if grp=2 then classc2=1;
else if grp=. then call missing(classc1, classc2);

 

 

Since this is homework, this is likely the approach you need to follow but if you ever need to actually do this for an analysis you want a different approach. If you have 4 variable with 10 levels each, the approach above gets tedious fast. 

There's an example here, as well as several links at the bottom of the post that illustrate other approaches:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-dummy-variables-Categorical-Var...

 


But why create dummy variables at all? Most procedures in SAS don't require you (the user) to explicitly create dummy variables. 

--
Paige Miller

View solution in original post


All Replies
Respected Advisor
Posts: 3,018

Re: Questions Regarding Dummy Variables

Your code does not assign dummy variables a value when grp=3

 

Most SAS analyses don't require dummy variables to be created beforehand anyway.

--
Paige Miller
Regular Contributor
Posts: 161

Re: Questions Regarding Dummy Variables

yes, just change to:

   if grp=1 then classc1=1; else classc1=0;
    if grp=2 then classc2=1; else classc2=0;
--------------
blog: papersandprograms.com
Super User
Super User
Posts: 8,111

Re: Questions Regarding Dummy Variables

[ Edited ]

Remember that SAS evaluates boolean expressions to either 1 (true) or 0 (false).

So to make two dummy variables that mean GRP=1 or GRP=2 then do this.

*Creating Dummy Variables;
    classc1= (grp=1);
    classc2= (grp=2);
    if missing(grp) then call missing(classc1,classc2);

 Let's test it.

data test;
  input grp @@ ;
  classc1= (grp=1);
  classc2= (grp=2);
  if missing(grp) then call missing(classc1,classc2);
cards;
1 2 3 .
;

image.png

Occasional Contributor
Posts: 13

Questions Regarding Dummy Variables

[ Edited ]

I was working on a problem  that involved creating dummy variables, but I ran into an issue where I'm having missing values for the dummy variables in the corresponding reference category even though the dataset doesn't have missing values. Even if I'm selecting one of the categories to be the reference category or variable, shouldn't the dummy variable values be zero? I had the same issue even when I did not account for missing values. I've included my code, log, output, and the content of the text file for context and so that my question will be clearer. 

 

The part of the homework assignment that I'm having issues with is the following: 

 

Fibromyalgia is a syndrome of widespread body pain that is often treated by rheumatologists. One way of measuring the impact of fibromyalgia on patients is the Fibromyalgia Impact Questionnaire (FIQ). On the FIQ, high values show greater impact of disease (bad) and low values show lesser impact of disease (good). We have data on women with fibromyalgia who attended one of two types of disease self-management classes or who received standard care (the control group).

 

Data from this study are in the file fibr03_sum18.txt on the BS 805 web site in the Assignments section for Class 6. The variables in the data file are:

 

  1. FIQ score (3.1 format) taken after the classes
  2. Group (1 = class 1, 2 = class 2, 3 = standard care)
  3. Disease Severity (On a scale of 1 to 6) before the classes
  4. Age (years)

Since the data were entered into this file, information on a new patient and a correction to the data have been found. The new patient is in the control group, has FIQ = 8.2, Disease Severity =2, and Age = 25 years. The correction is that the second subject in class 1 was 17 rather than 18 years old.

 

A) Create a temporary SAS data set using these data.  In the data set, create a set of indicator variables that code for group membership. Use PROC PRINT to list the data.

 

I read in the text file using column input, but I think it can be read in using list input as well? The text file contained the data below was the file was called: fibr03_sum18.txt. 

 

3.1 1 6 21
1.8 1 6 18
3.3 1 5 22
2.9 1 4 15
4.3 1 3 24
4.8 1 3 22
4.9 1 2 17
6.4 1 2 18
5.7 2 5 17
6.1 2 5 25
8.5 2 3 31
7.1 2 2 17
7.7 2 1 25
9.8 2 1 22
5.1 3 4 23
7.2 3 1 15
8.3 3 1 22
6.7 3 2 20

 

My code for reading in the data and creating the temporary dataset with the dummy variables was: 

*Part A: Reading in Data and Creating a Temporary Dataset; 
libname HW6 'C:\Users\jackz\Desktop\SAS';
filename HW6new 'C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt';
proc format;
    value grpf 1='class 1' 2='class 2' 3='standard care';
run; 
data one; 
    infile HW6new;
    input @1 FIQ 3.1 @5 grp 1. @7 disev 1. @9 age 2.;
*Creating Dummy Variables;
    if grp=1 then classc1=1; else if grp=2 then classc1=0;
    if grp=2 then classc2=1; else if grp=1 then classc2=0;
    if grp=. then classc1=.;
    if grp=. then classc2=.;
    label FIQ='FIQ Score'
    grp='Group'
    disev='Disease Severity'
    age='Age';
    format grp grpf.;
    run; 
*Printout of Dataset one;
proc print data=one label; 
run; 

 

My log for this code was: 

NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M5)
      Licensed to BOSTON UNIVERSITY - SFA T&R, Site 70009029.
NOTE: This session is executing on the W32_10HOME  platform.



NOTE: Updated analytical products:

      SAS/STAT 14.3
      SAS/ETS 14.3
      SAS/OR 14.3
      SAS/IML 14.3
      SAS/QC 14.3

NOTE: Additional host information:

 W32_10HOME WIN 10.0.16299  Workstation

NOTE: SAS initialization used:
      real time           0.96 seconds
      cpu time            0.95 seconds

1    *Part A: Reading in Data and Creating a Temporary Dataset;
2    libname HW6 'C:\Users\jackz\Desktop\SAS';
NOTE: Libref HW6 was successfully assigned as follows:
      Engine:        V9
      Physical Name: C:\Users\jackz\Desktop\SAS
3    filename HW6new 'C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt';
4    proc format;
5        value grpf 1='class 1' 2='class 2' 3='standard care';
NOTE: Format GRPF has been output.
6    run;

NOTE: PROCEDURE FORMAT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


7    data one;
8        infile HW6new;
9        input @1 FIQ 3.1 @5 grp 1. @7 disev 1. @9 age 2.;
10   *Creating Dummy Variables;
11       if grp=1 then classc1=1; else if grp=2 then classc1=0;
12       if grp=2 then classc2=1; else if grp=1 then classc2=0;
13       if grp=. then classc1=.;
14       if grp=. then classc2=.;
15       label FIQ='FIQ Score'
16       grp='Group'
17       disev='Disease Severity'
18       age='Age';
19       format grp grpf.;
20       run;

NOTE: The infile HW6NEW is:
      Filename=C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=214,
      Last Modified=15Jun2018:12:56:26,
      Create Time=15Jun2018:12:56:26

NOTE: 18 records were read from the infile HW6NEW.
      The minimum record length was 10.
      The maximum record length was 10.
NOTE: The data set WORK.ONE has 18 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


21   *Printout of Dataset one;
22   proc print data=one label;
NOTE: Writing HTML Body file: sashtml.htm
23   run;

NOTE: There were 18 observations read from the data set WORK.ONE.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.27 seconds
      cpu time            0.06 seconds

My output for this code is in the attachments. I will also put it here, although it is not lined up. 

 

The SAS System 


Obs FIQ Score Group Disease
Severity Age classc1 classc2 
1 3.1 class 1 6 21 1 0 
2 1.8 class 1 6 18 1 0 
3 3.3 class 1 5 22 1 0 
4 2.9 class 1 4 15 1 0 
5 4.3 class 1 3 24 1 0 
6 4.8 class 1 3 22 1 0 
7 4.9 class 1 2 17 1 0 
8 6.4 class 1 2 18 1 0 
9 5.7 class 2 5 17 0 1 
10 6.1 class 2 5 25 0 1 
11 8.5 class 2 3 31 0 1 
12 7.1 class 2 2 17 0 1 
13 7.7 class 2 1 25 0 1 
14 9.8 class 2 1 22 0 1 
15 5.1 standard care 4 23 . . 
16 7.2 standard care 1 15 . . 
17 8.3 standard care 1 22 . . 
18 6.7 standard care 2 20 . . 

You cans see that there are missing values for the dummy variables classc1 and classc2 even though there are no missing values in the original dataset. Should those values read 0, since group 3 does not fall in either grp=1 or grp=2? 

 

Can anyone give me any hints as to what I have done wrong, if I have done anything wrong? Thanks for all of your help! 

Super User
Posts: 23,724

Re: Questions Regarding Dummy Variables

 

 if grp=1 then classc1=1; else if grp=2 then classc1=0;
    if grp=2 then classc2=1; else if grp=1 then classc2=0;

 

These lines are you issue because if grp=3 then neither condition is true and classc1 and classc2 stay at their previous values, which in this case is missing. You can modify it as others have suggested by apply a generic else, but I prefer to set it 0 at the top of the code unless you don't need that. If you have missing you need to account for that though.

 

 

classc1=0;
classc2=0;

if grp=1 then classc1=1;
else if grp=2 then classc2=1;
else if grp=. then call missing(classc1, classc2);

 

 

Since this is homework, this is likely the approach you need to follow but if you ever need to actually do this for an analysis you want a different approach. If you have 4 variable with 10 levels each, the approach above gets tedious fast. 

There's an example here, as well as several links at the bottom of the post that illustrate other approaches:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-dummy-variables-Categorical-Var...

 


@JackZ295 wrote:

I was working on a problem that involved creating dummy variables, but I ran into an issue where I'm having missing values for the dummy variables in the corresponding reference category even though the dataset doesn't have missing values. Even if I'm selecting one of the categories to be the reference category or variable, shouldn't the dummy variable values be zero? I had the same issue even when I did not account for missing values. I've included my code, log, output, and the content of the text file for context and so that my question will be clearer.

 

The part of the homework assignment that I'm having issues with is the following:

Fibromyalgia is a syndrome of widespread body pain that is often treated by rheumatologists. One way of measuring the impact of fibromyalgia on patients is the Fibromyalgia Impact Questionnaire (FIQ). On the FIQ, high values show greater impact of disease (bad) and low values show lesser impact of disease (good). We have data on women with fibromyalgia who attended one of two types of disease self-management classes or who received standard care (the control group).

 

Data from this study are in the file fibr03_sum18.txt on the BS 805 web site in the Assignments section for Class 6. The variables in the data file are:

 

FIQ score (3.1 format) taken after the classes Group (1 = class 1, 2 = class 2, 3 = standard care) Disease Severity (On a scale of 1 to 6) before the classes Age (years) Since the data were entered into this file, information on a new patient and a correction to the data have been found. The new patient is in the control group, has FIQ = 8.2, Disease Severity =2, and Age = 25 years. The correction is that the second subject in class 1 was 17 rather than 18 years old.

 

A) Create a temporary SAS data set using these data. In the data set, create a set of indicator variables that code for group membership. Use PROC PRINT to list the data.

 

I read in the text file using column input, but I think it can be read in using list input as well? The text file contained the data below was the file was called: fibr03_sum18.txt.

3.1 1 6 21
1.8 1 6 18
3.3 1 5 22
2.9 1 4 15
4.3 1 3 24
4.8 1 3 22
4.9 1 2 17
6.4 1 2 18
5.7 2 5 17
6.1 2 5 25
8.5 2 3 31
7.1 2 2 17
7.7 2 1 25
9.8 2 1 22
5.1 3 4 23
7.2 3 1 15
8.3 3 1 22
6.7 3 2 20

My code for reading in the data and creating the temporary dataset with the dummy variables was:

*Part A: Reading in Data and Creating a Temporary Dataset; 
libname HW6 'C:\Users\jackz\Desktop\SAS';
filename HW6new 'C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt';
proc format;
    value grpf 1='class 1' 2='class 2' 3='standard care';
run; 
data one; 
    infile HW6new;
    input @1 FIQ 3.1 @5 grp 1. @7 disev 1. @9 age 2.;
*Creating Dummy Variables;
    if grp=1 then classc1=1; else if grp=2 then classc1=0;
    if grp=2 then classc2=1; else if grp=1 then classc2=0;
    if grp=. then classc1=.;
    if grp=. then classc2=.;
    label FIQ='FIQ Score'
    grp='Group'
    disev='Disease Severity'
    age='Age';
    format grp grpf.;
    run; 
*Printout of Dataset one;
proc print data=one label; 
run; 

My log for this code was:

NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M5)
      Licensed to BOSTON UNIVERSITY - SFA T&R, Site 70009029.
NOTE: This session is executing on the W32_10HOME  platform.



NOTE: Updated analytical products:

      SAS/STAT 14.3
      SAS/ETS 14.3
      SAS/OR 14.3
      SAS/IML 14.3
      SAS/QC 14.3

NOTE: Additional host information:

 W32_10HOME WIN 10.0.16299  Workstation

NOTE: SAS initialization used:
      real time           0.96 seconds
      cpu time            0.95 seconds

1    *Part A: Reading in Data and Creating a Temporary Dataset;
2    libname HW6 'C:\Users\jackz\Desktop\SAS';
NOTE: Libref HW6 was successfully assigned as follows:
      Engine:        V9
      Physical Name: C:\Users\jackz\Desktop\SAS
3    filename HW6new 'C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt';
4    proc format;
5        value grpf 1='class 1' 2='class 2' 3='standard care';
NOTE: Format GRPF has been output.
6    run;

NOTE: PROCEDURE FORMAT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


7    data one;
8        infile HW6new;
9        input @1 FIQ 3.1 @5 grp 1. @7 disev 1. @9 age 2.;
10   *Creating Dummy Variables;
11       if grp=1 then classc1=1; else if grp=2 then classc1=0;
12       if grp=2 then classc2=1; else if grp=1 then classc2=0;
13       if grp=. then classc1=.;
14       if grp=. then classc2=.;
15       label FIQ='FIQ Score'
16       grp='Group'
17       disev='Disease Severity'
18       age='Age';
19       format grp grpf.;
20       run;

NOTE: The infile HW6NEW is:
      Filename=C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=214,
      Last Modified=15Jun2018:12:56:26,
      Create Time=15Jun2018:12:56:26

NOTE: 18 records were read from the infile HW6NEW.
      The minimum record length was 10.
      The maximum record length was 10.
NOTE: The data set WORK.ONE has 18 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


21   *Printout of Dataset one;
22   proc print data=one label;
NOTE: Writing HTML Body file: sashtml.htm
23   run;

NOTE: There were 18 observations read from the data set WORK.ONE.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.27 seconds
      cpu time            0.06 seconds

I have put the output in the document attached. I am also including it here, although it is not lined up: 



The SAS System 


Obs FIQ Score Group Disease
Severity Age classc1 classc2 
1 3.1 class 1 6 21 1 0 
2 1.8 class 1 6 18 1 0 
3 3.3 class 1 5 22 1 0 
4 2.9 class 1 4 15 1 0 
5 4.3 class 1 3 24 1 0 
6 4.8 class 1 3 22 1 0 
7 4.9 class 1 2 17 1 0 
8 6.4 class 1 2 18 1 0 
9 5.7 class 2 5 17 0 1 
10 6.1 class 2 5 25 0 1 
11 8.5 class 2 3 31 0 1 
12 7.1 class 2 2 17 0 1 
13 7.7 class 2 1 25 0 1 
14 9.8 class 2 1 22 0 1 
15 5.1 standard care 4 23 . . 
16 7.2 standard care 1 15 . . 
17 8.3 standard care 1 22 . . 
18 6.7 standard care 2 20 . . 

You can see that there are missing values for the dummy variables classc1 and classc2 even though there are no missing values in the original dataset. Should those values read 0, since group 3 does not fall in either grp=1 or grp=2?

 

Can anyone give me any hints as to what I have done wrong, if I have done anything wrong? Thanks for all of your help!

 


 

Solution
‎06-17-2018 12:28 PM
Respected Advisor
Posts: 3,018

Re: Questions Regarding Dummy Variables


@Reeza wrote:

 

 if grp=1 then classc1=1; else if grp=2 then classc1=0;
    if grp=2 then classc2=1; else if grp=1 then classc2=0;

 

These lines are you issue because if grp=3 then neither condition is true and classc1 and classc2 stay at their previous values, which in this case is missing. You can modify it as others have suggested by apply a generic else, but I prefer to set it 0 at the top of the code unless you don't need that. If you have missing you need to account for that though.

 

 

classc1=0;
classc2=0;

if grp=1 then classc1=1;
else if grp=2 then classc2=1;
else if grp=. then call missing(classc1, classc2);

 

 

Since this is homework, this is likely the approach you need to follow but if you ever need to actually do this for an analysis you want a different approach. If you have 4 variable with 10 levels each, the approach above gets tedious fast. 

There's an example here, as well as several links at the bottom of the post that illustrate other approaches:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-dummy-variables-Categorical-Var...

 


But why create dummy variables at all? Most procedures in SAS don't require you (the user) to explicitly create dummy variables. 

--
Paige Miller
Super User
Posts: 23,724

Re: Questions Regarding Dummy Variables

Posted in reply to PaigeMiller
Agreed, but the OP says its a homework assignment and usually you don't get a choice in those assignments. Also, many procs support CLASS statements but don't allow you to choose the parameterization approach or the reference level beyond first/last like PROC PHREG and LOGISTIC. Once that's been expanded then I'd feel comfortable saying don't create dummy variables.
Respected Advisor
Posts: 3,018

Re: Questions Regarding Dummy Variables


@Reeza wrote:
Also, many procs support CLASS statements but don't allow you to choose the parameterization approach or the reference level beyond first/last like PROC PHREG and LOGISTIC. Once that's been expanded then I'd feel comfortable saying don't create dummy variables.

All of this is true, but the original poster has not indicated any such concerns about dummy variables, or that he is in one of these situations.

 

COMPLETELY OFF TOPIC COMMENT

I can't stand it that professor assign students a homework assignment where the student has to do things the HARD way, by computing their own dummy variables; instead of giving homework assignments where the student learns the easy (and more error-proof) way of using dummy variables, which is to let the SAS PROC handle this internally, without the user having to first create his/her own dummy variables. The student ought to be learning the power of SAS, rather than learning how to do things the hard way.

--
Paige Miller
Super User
Posts: 23,724

Re: Questions Regarding Dummy Variables

Posted in reply to PaigeMiller

@PaigeMiller wrote:

 

COMPLETELY OFF TOPIC COMMENT

I can't stand it that professor assign students a homework assignment where the student has to do things the HARD way, by computing their own dummy variables; instead of giving homework assignments where the student learns the easy (and more error-proof) way of using dummy variables, which is to let the SAS PROC handle this internally, without the user having to first create his/her own dummy variables. The student ought to be learning the power of SAS, rather than learning how to do things the hard way.


Yeah. I've been thinking of that lately. SAS is an old language and there's so much out there that's outdated that people still use and teach. Having a more up to date reliable reference is useful, but some of that seems to be users not filtering properly. I've been using the date filter on my google searches almost every search these days. 

Respected Advisor
Posts: 3,018

Re: Questions Regarding Dummy Variables

[ Edited ]

CONTINUING OFF TOPIC

 

Google searching, for example, for the latest PROC REPORT documentation, use:

 

proc report site:documentation.sas.com

 

NOW BACK ON TOPIC

 

As far as I can tell, the question asked by @JackZ295 has been answered satisfactorily, unless we here from him further.

--
Paige Miller
Occasional Contributor
Posts: 13

Re: Questions Regarding Dummy Variables

Posted in reply to PaigeMiller

Thanks so much everybody! @PaigeMiller @Reeza @Tom @PaulBrownPhD Sorry, I didn't get a chance to accept a solution or comment. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 242 views
  • 15 likes
  • 5 in conversation