BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ilikesas
Barite | Level 11

Hi,

suppose I have the following data:

Date
1/1/2010
23/6/2013
13/10/2008

What I would like to create are new dummy variables that will have a value of 1 for the corresponding year and 0 otherwise:

datedummy2010dummy2013dummy2008
1/1/2010100
23/6/2013010
13/10/2008001

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
statistician13
Quartz | Level 8

There's a much easier way to do this without much coding.  Let SAS create a design or model matrix usling proc logisitc and the resulting model matrix will have the form you are looking for:

/*Recreate your data*/

data work.temp;

input mydate DDMMYY10.;

myyear=year(mydate);

cards;

1/1/2010

23/6/2013

13/10/2008

;

run;

/*This is all you need to do.  The work.xmatrix will contain your dummy coding*/

   proc logistic data=work.temp outdesign=xmatrix outdesignonly;;

      class myyear / param=glm;

      model myyear=myyear;

      run;

View solution in original post

14 REPLIES 14
CTorres
Quartz | Level 8

Try this:

option missing=0;

data have;
  input date ddmmyy10.;
  format date ddmmyy10.;
  cards;
01/01/2010
23/06/2013
13/10/2008
;
run;
data temp;
  set have;
  year=year(date);
  N=1;
run;

proc transpose data=temp out=want(drop=_name_) prefix=dummy;
  by date notsorted;
  id year;
  var N;
run;

CTorres

ilikesas
Barite | Level 11

Hi CTorres,

Thanks a lot for replying, I ran the code and it worked fine !!!

Staying on the same note, I would like to ask a small variation of my initial question please.

Suppose that instead of dates I had some categories and wanted to get the dummies for the categories, like the following:

ID

Category

Dummy A

Dummy B

1

A

1

0

2

B

0

1

Thank you!

gergely_batho
SAS Employee

Be carefull if you set  options missing=0  .

Those zeros in the "want: table are still missing values.

data want;

  set have;

  dummy2010=(year(date)=2010);

  dummy2013=(year(date)=2013);

  dummy2008=(year(date)=2008);

run;

ilikesas
Barite | Level 11

Hi Gergely,

thank you for the reply, I ran the code and it worked good. But what if I have many categories for which I need to create corresponding dummies, and I don't know how many dummies there will be and what are they (and figuring this out and creating the dummies is tedious and time consuming) is there a way for SAS to do this for me instead of me actually creating dummy2010, dummy2013 etc?

Thank you

ilikesas
Barite | Level 11

The code of CTorres actually finds what dummies need to be created, but when I have the same date appearing more than once, the corresponding dummy variable isn't created because I get an error message saying that there are more than one columns of the same name...

gergely_batho
SAS Employee

And a similar approach on your new question:

data want;

  set have;

  dummyA=(Category='A');

  dummyB=(Category='B');

run;

CTorres
Quartz | Level 8

Hi Ilikesas,

Although irrelevant at this point, the variation of my answer would be as follows:

option missing=0;

data have;
  input id category $;
  N=1;
  cards;
01 A 
02 B
;
run;

proc transpose data=have out=want(drop=_name_) prefix=dummy;
  by ID;
  id category;
  copy category;
  var N;
run;

CTorres

ilikesas
Barite | Level 11

Hi CTorres, still thank you for replying because any additional knowledge is useful!

There is one problem when I use the transpose: if for example I have a date appearing several times then the transpose will give me an error because it is considering as if I want to create the same dummy column several times

CTorres
Quartz | Level 8

Hi Ilikesas,

Yes, maybe the PROC TRANSPOSE is not the rigth tool to solve your problem.

Anyway, you can use the LET option to "allows duplicate values of an ID variable. PROC TRANSPOSE transposes the observation that contains the last occurrence of a particular ID value within the data set or BY group.".


Check the following code witch includes two additional dates: one is duplicated and the other belongs to the same year. In this case the dates must be sorted. Maybe this is what you need:


data have;

  input date ddmmyy10.;

  format date ddmmyy10.;

  year=year(date);

  N=1;

  cards;

13/10/2008

01/01/2010

01/01/2010   <--- duplicated

23/06/2013

13/10/2013   <--- same year

;

run;

proc transpose data=have out=want(drop=_name_) let prefix=dummy;

  by date;

  id year;

  var N;

run;


SAS Log:


120  data have;
121    input date ddmmyy10.;
122    format date ddmmyy10.;
123    year=year(date);
124    N=1;
125    cards;

NOTE: The data set WORK.HAVE has 5 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


131  ;
132  run;
133  proc transpose data=have out=want(drop=_name_) let prefix=dummy;
134    by date;
135    id year;
136    var N;
137  run;

WARNING: The ID value "dummy2010" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
      date=01/01/2010
NOTE: There were 5 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 4 observations and 4 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

CTorres

CTorres
Quartz | Level 8

This is the result table:

          

datedummy2008dummy2010

dummy2013

113/10/2008100
201/01/2010010
323/06/2013001
413/10/2013001

CTorres

ilikesas
Barite | Level 11

Hi CTorres,

thanks for still having an interest in the question.

In the code that you put above, the duplicated date was removed and as such the table has 4 observations instead of the initial 5, but the thing is that I still need the duplicated date because it corresponds to a different event, i.e, two events happened on that specific date.

Thank you

statistician13
Quartz | Level 8

There's a much easier way to do this without much coding.  Let SAS create a design or model matrix usling proc logisitc and the resulting model matrix will have the form you are looking for:

/*Recreate your data*/

data work.temp;

input mydate DDMMYY10.;

myyear=year(mydate);

cards;

1/1/2010

23/6/2013

13/10/2008

;

run;

/*This is all you need to do.  The work.xmatrix will contain your dummy coding*/

   proc logistic data=work.temp outdesign=xmatrix outdesignonly;;

      class myyear / param=glm;

      model myyear=myyear;

      run;

ilikesas
Barite | Level 11

Hi Statistician13,

thank you for your code, it combines what the two previous codes separately, namely it can both automatically create the dummy variables and it allows for the same date to be repeated.

In the output I get an intercept value, and I guess that this is due to the proc logistic, but I can drop it right?

statistician13
Quartz | Level 8

Sure.  You can just use the drop statement:

   proc logistic data=work.temp outdesign=xmatrix(DROP=Intercept) outdesignonly;;

      class myyear / param=glm;

      model myyear=myyear;

      run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 5294 views
  • 6 likes
  • 4 in conversation