Creating dummy variables

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

Creating dummy variables

Hi guys,

I need some help creating dummy variables for a dataset, can someone help with a simple SAS code? Smiley Happy the data sample looks like this, i have given what i have and what i want as my output. I need monthly dummy's. please note that this is just a sample, i  have data with many ID's, here i have given only one.

havewant
IDyearmonthmonths_continuousbef_3monthsaft_12_monthsIDyearmonthmonths_continuousbef_3monthsaft_12_monthsmonth_-3month_-2month_-1month_+0month_+1month_+2month_+3month_+4month_+5month_+6month_+7month_+8month_+9month_+10month_+11
68200911006820091100000000000000000
68200922006820092200000000000000000
68200933006820093300000000000000000
68200944006820094400000000000000000
68200955006820095500000000000000000
68200966006820096600000000000000000
68200977006820097700000000000000000
68200988106820098810100000000000000
68200999106820099910010000000000000
682009101010682009101010001000000000000
682009111101682009111101000100000000000
6820101130168201011301000010000000000
6820102140168201021401000001000000000
6820103150168201031501000000100000000
6820104160168201041601000000010000000
6820105170168201051701000000001000000
6820106180168201061801000000000100000
6820107190168201071901000000000010000
6820108200168201082001000000000001000
6820109210168201092101000000000000100
682010102201682010102201000000000000010
682010112301682010112301000000000000001

-jessica wolfman Smiley Happy


Accepted Solutions
Solution
‎11-20-2014 07:38 AM
Super User
Posts: 9,681

Re: Creating dummy variables

Sorry. OK .try this one.

Still hard.

data have;
input ID     year     month     months_continuous     bef_3months     aft_12_months;
cards;
68     2009     1     1     0     0
68     2009     2     2     0     0
68     2009     3     3     0     0
68     2009     4     4     0     0
68     2009     5     5     0     0
68     2009     6     6     0     0
68     2009     7     7     0     0
68     2009     8     8     1     0
68     2009     9     9     1     0
68     2009     10     10     1     0
68     2009     11     11     0     1
68     2010     1     13     0     1
68     2010     2     14     0     1
68     2010     3     15     0     1
68     2010     4     16     0     1
68     2010     5     17     0     1
68     2010     6     18     0     1
68     2010     7     19     0     1
68     2010     8     20     0     1
68     2010     9     21     0     1
68     2010     10     22     0     1
68     2010     11     23     0     1
72     2009     1     1     0     0
72     2009     2     2     0     0
72     2009     3     3     1     0
72     2009     4     4     1     0
72     2009     5     5     1     0
72     2009     6     6     0     1
72     2009     7     7     0     1
72     2009     8     8     0     1
72     2009     9     9     0     1
72     2009     10     10     0     1
72     2009     11     11     0     1
72     2010     1     13     0     1
72     2010     2     14     0     1
72     2010     3     15     0     1
72     2010     4     16     0     1
72     2010     5     17     0     1
72     2010     6     18     0     1
;
run;
proc sort data=have;by  id      months_continuous;run;
data x1(keep=id month rename=(month=_month));
 set have;
 if  bef_3months=0 and lag(bef_3months)=1;
run;
data x2;
 merge have x1;
 by id;
run;
data temp;
 set x2;
 by id;
 retain flag 1;
 if bef_3months = 1 or  aft_12_months = 1 then do;n=months_continuous-_month;output;end;
 drop _month;
run;

proc transpose data=temp out=temp1(drop=_:) prefix=mon;
by   id  months_continuous ;
var flag;
id n;
idlabel n;
run;
data temp2;
 merge have temp1  ;
 by id months_continuous;
run;
proc stdize data=temp2 missing=0 out=want reponly;run;
proc sql noprint;
select translate(cats('mon',label),'_','-') into : list separated by ' '
 from dictionary.columns
  where libname='WORK' and memname='WANT' and label is not missing
   order by input(strip(label),best32.);
quit;

data want;
 retain ID     year     month     months_continuous     bef_3months     aft_12_months
 &list ;
 set want;
run;

Xia Keshan

Message was edited by: xia keshan

Message was edited by: xia keshan Still hard

View solution in original post


All Replies
Super User
Posts: 9,681

Re: Creating dummy variables

What is your logic ?

Contributor
Posts: 30

Re: Creating dummy variables

My logic is simple. I want a dummy for each month before three months(t-3) and after 12 months (0-11) of an individual s bday. I have the dummy indicating that.  But I would need one dummy variable for each month. Basically for each individual has his bday month for example individual ID 68 has his bday in the 11th month 2009. ,  I want to create three  dummy variable for three months before that (t-3,t-2,t-1) and 12 dummy  variables(t0...t+11) representing each month after bday month.

Super User
Super User
Posts: 7,404

Re: Creating dummy variables

IMO you would be better off with a normalized table, i.e. instead of variables for each month gap, have a row.  Alternatively if you must have variables, then call them all the same name, with and array statement, e.g. array months{10}.  You can label each one separately so identifying them is easy, but you then have additional functionality when using that data, i.e. you can easily loop of it, pass the array into functions etc.

E.g. I want to use the data as you have given, thus in each datastep I need to do:

data xyz;

     array values{10} month_minus_3 month_minus_2 month_minus_1 ...;

Using same identifier:

data xyz;

     array values{10} month1-month10;

Contributor
Posts: 30

Re: Creating dummy variables

i AM NEW TO SAS, I DONT HAVE A GOOD UNDERSTANDING OF ARRAYS. Smiley Sad  keshan, did you get my logic?

Super User
Super User
Posts: 7,404

Re: Creating dummy variables

No need to shout.  I would suggest reading up on arrays, they are very useful when dealing with normalized (i.e. data which goes across the table) data.

http://www2.sas.com/proceedings/sugi30/242-30.pdf

Its not particularly difficult, think of an array as a number reference to a set of variables.  Because its a numbered reference you can use the number to loop over lots of variables, or access specific ones.

Super User
Posts: 9,681

Re: Creating dummy variables

OK. Still hard to catch on you. If I understood what you mean.

data have;
input ID     year     month     months_continuous     bef_3months     aft_12_months;
cards;
68     2009     1     1     0     0
68     2009     2     2     0     0
68     2009     3     3     0     0
68     2009     4     4     0     0
68     2009     5     5     0     0
68     2009     6     6     0     0
68     2009     7     7     0     0
68     2009     8     8     1     0
68     2009     9     9     1     0
68     2009     10     10     1     0
68     2009     11     11     0     1
68     2010     1     13     0     1
68     2010     2     14     0     1
68     2010     3     15     0     1
68     2010     4     16     0     1
68     2010     5     17     0     1
68     2010     6     18     0     1
68     2010     7     19     0     1
68     2010     8     20     0     1
68     2010     9     21     0     1
68     2010     10     22     0     1
68     2010     11     23     0     1
;
run;
data temp;
 set have;
 retain flag 1 n -4;
 if bef_3months = 1 or   aft_12_months = 1 then do;n+1;output;end;
run;
proc transpose data=temp out=temp1(drop=_:) prefix=mon_;
by      months_continuous ;
var flag;
id n;
run;
data temp2;
 merge have temp1  ;
 by  months_continuous;
run;
proc stdize data=temp2 missing=0 out=want reponly;run;





Xia Keshan

Contributor
Posts: 30

Re: Creating dummy variables

hi , i ran the code.i think by looking at the code you got my logic. However  i got this error " Data set WORK.TEMP is not sorted in ascending sequence. The current BY group has months_continuous    = 23 and the next BY group has months_continuous= 3."

i tried sorting the temp data set by months_continuous, it did not help. any suggestions?

-jessica wolfman

Super User
Posts: 17,840

Re: Creating dummy variables

Show your code. You need to sort the input to the proc transpose and/or possibly your HAVE dataset if you haven't already.

Most likely this ahead of Xia's code should solve your problems:

proc sort data=have;

by id year month;

run;

Contributor
Posts: 30

Re: Creating dummy variables

Hi, here is my code, I just want to say that the name of my 'ID' is 'driver_id' , will that make any difference? coz i dont see any ref to that in the code.

proc sort data=data;

by year month_continuous;

run;

data temp;

set data;

  retain flag 1 n -4;

if before_3months = 1 or  after_12months = 1 then do;n+1;output;end;

run;

proc transpose data=temp out=temp1(drop=_Smiley Happy prefix=mon_;

by continuous ;

var flag;

id n;

run;

data temp2;

merge data temp1;

by  month_continuous;

run;

proc stdize data=temp2 missing=0 out=want reponly;run;

Super User
Super User
Posts: 7,404

Re: Creating dummy variables

I agree with Xia Keshan, not following what your trying to do.  If you just want a bunch of variables then the below code creates variables, you can name them what you want, but I would recommend that you have some logical order so that if you want to work with these, they can be set as an array again, or referred to in shorthand mnth1-mnth13 etc.

data want;

  set have;

  array mnth{13} 3.;

  label mnth1 "3 months before"

        mnth2 "2 months before"

        mnth3 "1 month before"

        mnth4 "1 month after"; /* ... */

run;

Super User
Posts: 9,681

Re: Creating dummy variables

As Reeza pointed out. You 'd better post your real data.

If I guess right the ID variable have multiple values .

Code updated.

data have;
input ID     year     month     months_continuous     bef_3months     aft_12_months;
cards;
68     2009     1     1     0     0
68     2009     2     2     0     0
68     2009     3     3     0     0
68     2009     4     4     0     0
68     2009     5     5     0     0
68     2009     6     6     0     0
68     2009     7     7     0     0
68     2009     8     8     1     0
68     2009     9     9     1     0
68     2009     10     10     1     0
68     2009     11     11     0     1
68     2010     1     13     0     1
68     2010     2     14     0     1
68     2010     3     15     0     1
68     2010     4     16     0     1
68     2010     5     17     0     1
68     2010     6     18     0     1
68     2010     7     19     0     1
68     2010     8     20     0     1
68     2010     9     21     0     1
68     2010     10     22     0     1
68     2010     11     23     0     1
69     2009     1     1     0     0
69     2009     2     2     0     0
69     2009     3     3     0     0
69     2009     4     4     0     0
69     2009     5     5     0     0
69     2009     6     6     0     0
69     2009     7     7     0     0
69     2009     8     8     1     0
69     2009     9     9     1     0
69     2009     10     10     1     0
69     2009     11     11     0     1
69     2010     1     13     0     1
69     2010     2     14     0     1
69     2010     3     15     0     1
69     2010     4     16     0     1
69     2010     5     17     0     1
69     2010     6     18     0     1
69     2010     7     19     0     1
69     2010     8     20     0     1
69     2010     9     21     0     1
69     2010     10     22     0     1
69     2010     11     23     0     1
;
run;

proc sort data=have; by id      months_continuous;run;
data temp;
 set have;
 by id;
 retain flag 1 n -4;
 if first.id then n=-4;
 if bef_3months = 1 or   aft_12_months = 1 then do;n+1;output;end;
run;

proc transpose data=temp out=temp1(drop=_:) prefix=mon_;
by   id  months_continuous ;
var flag;
id n;
run;
data temp2;
 merge have temp1  ;
 by id months_continuous;
run;
proc stdize data=temp2 missing=0 out=want reponly;run;


Xia Keshan

Super User
Posts: 17,840

Re: Creating dummy variables

At the least your proc sort needs to include ID, or whatever your ID is truly called.

proc sort data=data;

by ID year month_continuous;

run;

Super User
Posts: 9,681

Re: Creating dummy variables

OR if  bef_3months    aft_12_months  is dynamic . Try this one.

data have;
input ID     year     month     months_continuous     bef_3months     aft_12_months;
cards;
68     2009     1     1     0     0
68     2009     2     2     0     0
68     2009     3     3     0     0
68     2009     4     4     0     0
68     2009     5     5     0     0
68     2009     6     6     0     0
68     2009     7     7     0     0
68     2009     8     8     1     0
68     2009     9     9     1     0
68     2009     10     10     1     0
68     2009     11     11     0     1
68     2010     1     13     0     1
68     2010     2     14     0     1
68     2010     3     15     0     1
68     2010     4     16     0     1
68     2010     5     17     0     1
68     2010     6     18     0     1
68     2010     7     19     0     1
68     2010     8     20     0     1
68     2010     9     21     0     1
68     2010     10     22     0     1
68     2010     11     23     0     1
69     2009     1     1     0     0
69     2009     2     2     0     0
69     2009     3     3     0     0
69     2009     4     4     0     0
69     2009     5     5     0     0
69     2009     6     6     0     0
69     2009     7     7     0     0
69     2009     8     8     0     0
69     2009     9     9     1     0
69     2009     10     10     1     0
69     2009     11     11     0     1
69     2010     1     13     0     1
69     2010     2     14     0     1
69     2010     3     15     0     1
69     2010     4     16     0     1
69     2010     5     17     0     1
69     2010     6     18     0     1
69     2010     7     19     0     1
69     2010     8     20     0     1
69     2010     9     21     0     1
69     2010     10     22     0     1
69     2010     11     23     0     1
;
run;
proc sql;
create table x as
 select *,-1*(sum(bef_3months)+1) as sum
  from have
   group by id
    order by id,months_continuous;
quit;

data temp;
 set x;
 by id;
 retain flag 1 n;
 if first.id then n=sum;
 if bef_3months = 1 or   aft_12_months = 1 then do;n+1;output;end;
run;

proc transpose data=temp out=temp1(drop=_:) prefix=mon_;
by   id  months_continuous ;
var flag;
id n;
run;
data temp2;
 merge have temp1  ;
 by id months_continuous;
run;
proc stdize data=temp2 missing=0 out=want reponly;run;


Xia Keshan

Contributor
Posts: 30

Re: Creating dummy variables

Hi , thanks a lot! So far i am getting three columns with dummy's indicating bef_3months like this,

idyearmonthmonths_continuousbef_3monthsaft_12monthsmon__3mon__2mon__1
6820091100000
6820092200000
6820093300000
6820094400000
6820095500000
6820096600000
6820097700000
6820098810100
6820099910010
682009101010001
682009111101000
68201011301000
68201021401000
68201031501000
68201041601000
68201051701000
68201061801000
68201071901000
68201082001000
68201092101000
682010102201000
682010112301000
7220091100000
7220092200000
7220093310100
7220094410010
7220095510001
7220096601000
7220097701000
7220098801000
7220099901000
722009101001000
722009111101000
72201011301000
72201021401000
72201031501000
72201041601000
72201051701000
72201061801000

what i want is similar dummy's for aft_months as well (see below), here i need twelve variables similar to the three generated earlier. Like this, can i change the code to get that?

idyearmonthmonths_continuousbef_3monthsaft_12monthsmon__3mon__2mon__1mon__2mon__3mon__4mon__5mon__6mon__7mon__8mon__9mon__10mon__11mon__12mon__13
6820091100000000000000000
6820092200000000000000000
6820093300000000000000000
6820094400000000000000000
6820095500000000000000000
6820096600000000000000000
6820097700000000000000000
6820098810100000000000000
6820099910010000000000000
682009101010001000000000000
682009111101000100000000000
68201011301000010000000000
68201021401000001000000000
68201031501000000100000000
68201041601000000010000000
68201051701000000001000000
68201061801000000000100000
68201071901000000000010000
68201082001000000000001000
68201092101000000000000100
682010102201000000000000010
682010112301000000000000001
7220091100000000000000000
7220092200000000000000000
7220093310100000000000000
7220094410010000000000000
7220095510001000000000000
7220096601000100000000000
7220097701000010000000000
7220098801000001000000000
7220099901000000100000000
722009101001000000010000000
722009111101000000001000000
72201011301000000000100000
72201021401000000000010000
72201031501000000000001000
72201041601000000000000100
72201051701000000000000010
72201061801000000000000001
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 419 views
  • 6 likes
  • 4 in conversation