Hi,
suppose I have the following data:
Date |
---|
1/1/2010 |
23/6/2013 |
13/10/2008 |
What I would like to create are new dummy variables that will have a value of 1 for the corresponding year and 0 otherwise:
date | dummy2010 | dummy2013 | dummy2008 |
---|---|---|---|
1/1/2010 | 1 | 0 | 0 |
23/6/2013 | 0 | 1 | 0 |
13/10/2008 | 0 | 0 | 1 |
Thank you
There's a much easier way to do this without much coding. Let SAS create a design or model matrix usling proc logisitc and the resulting model matrix will have the form you are looking for:
/*Recreate your data*/
data work.temp;
input mydate DDMMYY10.;
myyear=year(mydate);
cards;
1/1/2010
23/6/2013
13/10/2008
;
run;
/*This is all you need to do. The work.xmatrix will contain your dummy coding*/
proc logistic data=work.temp outdesign=xmatrix outdesignonly;;
class myyear / param=glm;
model myyear=myyear;
run;
Try this:
option missing=0;
data have;
input date ddmmyy10.;
format date ddmmyy10.;
cards;
01/01/2010
23/06/2013
13/10/2008
;
run;
data temp;
set have;
year=year(date);
N=1;
run;
proc transpose data=temp out=want(drop=_name_) prefix=dummy;
by date notsorted;
id year;
var N;
run;
CTorres
Hi CTorres,
Thanks a lot for replying, I ran the code and it worked fine !!!
Staying on the same note, I would like to ask a small variation of my initial question please.
Suppose that instead of dates I had some categories and wanted to get the dummies for the categories, like the following:
ID | Category | Dummy A | Dummy B |
1 | A | 1 | 0 |
2 | B | 0 | 1 |
Thank you!
Be carefull if you set options missing=0 .
Those zeros in the "want: table are still missing values.
data want;
set have;
dummy2010=(year(date)=2010);
dummy2013=(year(date)=2013);
dummy2008=(year(date)=2008);
run;
Hi Gergely,
thank you for the reply, I ran the code and it worked good. But what if I have many categories for which I need to create corresponding dummies, and I don't know how many dummies there will be and what are they (and figuring this out and creating the dummies is tedious and time consuming) is there a way for SAS to do this for me instead of me actually creating dummy2010, dummy2013 etc?
Thank you
The code of CTorres actually finds what dummies need to be created, but when I have the same date appearing more than once, the corresponding dummy variable isn't created because I get an error message saying that there are more than one columns of the same name...
And a similar approach on your new question:
data want;
set have;
dummyA=(Category='A');
dummyB=(Category='B');
run;
Hi Ilikesas,
Although irrelevant at this point, the variation of my answer would be as follows:
option missing=0;
data have;
input id category $;
N=1;
cards;
01 A
02 B
;
run;
proc transpose data=have out=want(drop=_name_) prefix=dummy;
by ID;
id category;
copy category;
var N;
run;
CTorres
Hi CTorres, still thank you for replying because any additional knowledge is useful!
There is one problem when I use the transpose: if for example I have a date appearing several times then the transpose will give me an error because it is considering as if I want to create the same dummy column several times
Hi Ilikesas,
Yes, maybe the PROC TRANSPOSE is not the rigth tool to solve your problem.
Anyway, you can use the LET option to "allows duplicate values of an ID variable. PROC TRANSPOSE transposes the observation that contains the last occurrence of a particular ID value within the data set or BY group.".
Check the following code witch includes two additional dates: one is duplicated and the other belongs to the same year. In this case the dates must be sorted. Maybe this is what you need:
data have;
input date ddmmyy10.;
format date ddmmyy10.;
year=year(date);
N=1;
cards;
13/10/2008
01/01/2010
01/01/2010 <--- duplicated
23/06/2013
13/10/2013 <--- same year
;
run;
proc transpose data=have out=want(drop=_name_) let prefix=dummy;
by date;
id year;
var N;
run;
SAS Log:
120 data have;
121 input date ddmmyy10.;
122 format date ddmmyy10.;
123 year=year(date);
124 N=1;
125 cards;
NOTE: The data set WORK.HAVE has 5 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
131 ;
132 run;
133 proc transpose data=have out=want(drop=_name_) let prefix=dummy;
134 by date;
135 id year;
136 var N;
137 run;
WARNING: The ID value "dummy2010" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
date=01/01/2010
NOTE: There were 5 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 4 observations and 4 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
CTorres
This is the result table:
| date | dummy2008 | dummy2010 | dummy2013 |
---|---|---|---|---|
1 | 13/10/2008 | 1 | 0 | 0 |
2 | 01/01/2010 | 0 | 1 | 0 |
3 | 23/06/2013 | 0 | 0 | 1 |
4 | 13/10/2013 | 0 | 0 | 1 |
CTorres
Hi CTorres,
thanks for still having an interest in the question.
In the code that you put above, the duplicated date was removed and as such the table has 4 observations instead of the initial 5, but the thing is that I still need the duplicated date because it corresponds to a different event, i.e, two events happened on that specific date.
Thank you
There's a much easier way to do this without much coding. Let SAS create a design or model matrix usling proc logisitc and the resulting model matrix will have the form you are looking for:
/*Recreate your data*/
data work.temp;
input mydate DDMMYY10.;
myyear=year(mydate);
cards;
1/1/2010
23/6/2013
13/10/2008
;
run;
/*This is all you need to do. The work.xmatrix will contain your dummy coding*/
proc logistic data=work.temp outdesign=xmatrix outdesignonly;;
class myyear / param=glm;
model myyear=myyear;
run;
Hi Statistician13,
thank you for your code, it combines what the two previous codes separately, namely it can both automatically create the dummy variables and it allows for the same date to be repeated.
In the output I get an intercept value, and I guess that this is due to the proc logistic, but I can drop it right?
Sure. You can just use the drop statement:
proc logistic data=work.temp outdesign=xmatrix(DROP=Intercept) outdesignonly;;
class myyear / param=glm;
model myyear=myyear;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.