Say I have a subsample of my data that looks as follows:
year | event_year | change | id |
2007 | 2009 | 3 | |
2008 | 2009 | 3 | |
2009 | 2009 | 1 | 3 |
2010 | 2009 | 3 | |
2011 | 2009 | 3 | |
2012 | 2009 | 3 | |
2002 | 2004 | 55 | |
2003 | 2004 | 55 | |
2004 | 2004 | 0 | 55 |
2005 | 2004 | 55 |
Whenever the variable change takes a value of 1 for year = event_ye, I want to create a new column for that id only called event_year_new where it is equal to event_year+1. If change = 0, then nothing happens and event_year = event_year_new:
year | event_year | change | id | event_year_new |
2007 | 2009 | 3 | 2010 | |
2008 | 2009 | 3 | 2010 | |
2009 | 2009 | 1 | 3 | 2010 |
2010 | 2009 | 3 | 2010 | |
2011 | 2009 | 3 | 2010 | |
2012 | 2009 | 3 | 2010 | |
2002 | 2004 | 55 | 2004 | |
2003 | 2004 | 55 | 2004 | |
2004 | 2004 | 0 | 55 | 2004 |
2005 | 2004 | 55 | 2004 |
How could I do this? Of course the below code does not work because it only changes it for the case when year = event_year, but I want it to apply to each ID.
if year = event_year and change=1 then
do;
event_year_new = event_ye+1;
end;
Assuming each id never has more than one change value, and that the data are sorted by id, then:
data have;
input year event_year @17 change 1. id;
datalines;
2007 2009 3
2008 2009 3
2009 2009 1 3
2010 2009 3
2011 2009 3
2012 2009 3
2002 2004 55
2003 2004 55
2004 2004 0 55
2005 2004 55
run;
data want;
set have (where=(change^=.) in=inchange)
have (in=inwanted);
by id;
if first.id then do;
if change=1 then event_year_new=event_year+1; else
if change=0 then event_year_new=event_year; else
event_year_new=.;
end;
retain event_year_new;
if inwanted;
run;
For each id, this reads the change record prior to reading all the records. So the event_year_new can be established at the start of each id. Leaving the event_year_new variable retained propagates it through all the records for that id.
Question: what if a given ID has more than one non-missing value for change? Do you plan to make multiple new columns in such cases?
No, the change variable only has values for when year = event_year. It is always missing for other cases.
Assuming each id never has more than one change value, and that the data are sorted by id, then:
data have;
input year event_year @17 change 1. id;
datalines;
2007 2009 3
2008 2009 3
2009 2009 1 3
2010 2009 3
2011 2009 3
2012 2009 3
2002 2004 55
2003 2004 55
2004 2004 0 55
2005 2004 55
run;
data want;
set have (where=(change^=.) in=inchange)
have (in=inwanted);
by id;
if first.id then do;
if change=1 then event_year_new=event_year+1; else
if change=0 then event_year_new=event_year; else
event_year_new=.;
end;
retain event_year_new;
if inwanted;
run;
For each id, this reads the change record prior to reading all the records. So the event_year_new can be established at the start of each id. Leaving the event_year_new variable retained propagates it through all the records for that id.
Does below work for your actual data?
data have;
infile datalines dsd truncover;
input year event_year change id;
datalines;
2007,2009, ,3
2008,2009, ,3
2009,2009,1,3
2010,2009, ,3
2011,2009, ,3
2012,2009, ,3
2002,2004, ,55
2003,2004, ,55
2004,2004,0,55
2005,2004, ,55
;
data want(drop=_:);
merge
have
have( keep=id change
where=(not missing(_change))
rename=(change=_change)
);
by id;
event_year_new=sum(event_year,_change);
run;
proc print data=want;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.