Say I have a subsample of my data that looks as follows:
year | event_year | change | id |
2007 | 2009 | 3 | |
2008 | 2009 | 3 | |
2009 | 2009 | 1 | 3 |
2010 | 2009 | 3 | |
2011 | 2009 | 3 | |
2012 | 2009 | 3 | |
2002 | 2004 | 55 | |
2003 | 2004 | 55 | |
2004 | 2004 | 0 | 55 |
2005 | 2004 | 55 |
Whenever the variable change takes a value of 1 for year = event_ye, I want to create a new column for that id only called event_year_new where it is equal to event_year+1. If change = 0, then nothing happens and event_year = event_year_new:
year | event_year | change | id | event_year_new |
2007 | 2009 | 3 | 2010 | |
2008 | 2009 | 3 | 2010 | |
2009 | 2009 | 1 | 3 | 2010 |
2010 | 2009 | 3 | 2010 | |
2011 | 2009 | 3 | 2010 | |
2012 | 2009 | 3 | 2010 | |
2002 | 2004 | 55 | 2004 | |
2003 | 2004 | 55 | 2004 | |
2004 | 2004 | 0 | 55 | 2004 |
2005 | 2004 | 55 | 2004 |
How could I do this? Of course the below code does not work because it only changes it for the case when year = event_year, but I want it to apply to each ID.
if year = event_year and change=1 then
do;
event_year_new = event_ye+1;
end;
Assuming each id never has more than one change value, and that the data are sorted by id, then:
data have;
input year event_year @17 change 1. id;
datalines;
2007 2009 3
2008 2009 3
2009 2009 1 3
2010 2009 3
2011 2009 3
2012 2009 3
2002 2004 55
2003 2004 55
2004 2004 0 55
2005 2004 55
run;
data want;
set have (where=(change^=.) in=inchange)
have (in=inwanted);
by id;
if first.id then do;
if change=1 then event_year_new=event_year+1; else
if change=0 then event_year_new=event_year; else
event_year_new=.;
end;
retain event_year_new;
if inwanted;
run;
For each id, this reads the change record prior to reading all the records. So the event_year_new can be established at the start of each id. Leaving the event_year_new variable retained propagates it through all the records for that id.
Question: what if a given ID has more than one non-missing value for change? Do you plan to make multiple new columns in such cases?
No, the change variable only has values for when year = event_year. It is always missing for other cases.
Assuming each id never has more than one change value, and that the data are sorted by id, then:
data have;
input year event_year @17 change 1. id;
datalines;
2007 2009 3
2008 2009 3
2009 2009 1 3
2010 2009 3
2011 2009 3
2012 2009 3
2002 2004 55
2003 2004 55
2004 2004 0 55
2005 2004 55
run;
data want;
set have (where=(change^=.) in=inchange)
have (in=inwanted);
by id;
if first.id then do;
if change=1 then event_year_new=event_year+1; else
if change=0 then event_year_new=event_year; else
event_year_new=.;
end;
retain event_year_new;
if inwanted;
run;
For each id, this reads the change record prior to reading all the records. So the event_year_new can be established at the start of each id. Leaving the event_year_new variable retained propagates it through all the records for that id.
Does below work for your actual data?
data have;
infile datalines dsd truncover;
input year event_year change id;
datalines;
2007,2009, ,3
2008,2009, ,3
2009,2009,1,3
2010,2009, ,3
2011,2009, ,3
2012,2009, ,3
2002,2004, ,55
2003,2004, ,55
2004,2004,0,55
2005,2004, ,55
;
data want(drop=_:);
merge
have
have( keep=id change
where=(not missing(_change))
rename=(change=_change)
);
by id;
event_year_new=sum(event_year,_change);
run;
proc print data=want;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.