Hello everyone,
I am a relatively new SAS user and working on a project for which I need to do the following:
For each company ID, I have a dummy variable "event occurrence" that is either 1 or 0, for each year from 1992 to 2017.
I want to be able to count the number of observations between every two consecutive events ( every time event occurrence =1)
Then, if the difference is less than three years then the variable (want_var) will equal 1 for those years
if the difference=3 then the variable (want_var) will equal 0 for those years.
if the difference is >3 years, then, the variable (want_var) will equal 1 for all those years up to three years before the next event.
What I am looking to achieve is the following
ID | year | Event occurrence | want_var |
1004 | 1992 | 0 | 0 |
1004 | 1993 | 0 | 0 |
1004 | 1994 | 0 | 0 |
1004 | 1995 | 1 | 1 |
1004 | 1996 | 1 | 1 |
1004 | 1997 | 0 | 0 |
1004 | 1998 | 0 | 0 |
1004 | 1999 | 0 | 0 |
1004 | 2000 | 1 | 1 |
1004 | 2001 | 1 | 1 |
1004 | 2002 | 0 | 1 |
1004 | 2003 | 1 | 1 |
1004 | 2004 | 0 | 1 |
1004 | 2005 | 0 | 1 |
1004 | 2006 | 0 | 1 |
1004 | 2007 | 0 | 1 |
1004 | 2008 | 0 | 1 |
1004 | 2009 | 0 | 1 |
1004 | 2010 | 0 | 1 |
1004 | 2011 | 0 | 0 |
1004 | 2012 | 0 | 0 |
1004 | 2013 | 0 | 0 |
1004 | 2014 | 1 | 1 |
1004 | 2015 | 0 | 1 |
1004 | 2016 | 0 | 1 |
Any advice is welcome!
Thank you
Okay @user95 Again as you review your data thoroughly it appears the missings goes as missings and your sample contains a lot of missing values. However all these changes are very minor IF-THENs if you follow the code to make changes. While you asked for one step solution which is fine, I really wonder whether you are able to follow and understand the code to edit/update should you consider maintaining in a production environment. Of course I hope you do. Nonetheless, it's easier to have it split into steps to modify/edit/update if you do not understand the code I am afraid. I leave that up to you.
data want;
do until(last.id);
set have;
by id Event_occurrence notsorted;
array t(999999) _temporary_;
if first.Event_occurrence then _n=0;
if Event_occurrence then do; _f=1;y=year;end;
if _f then _n+(Event_occurrence=0);
if last.Event_occurrence and Event_occurrence=0 and _f then do;
_i+1;
t(_i)=_n;
end;
end;
call missing(of _:);
if not missing(y) then _y=year ne y;
do until(last.id);
set have;
by id Event_occurrence notsorted;
want_var=Event_occurrence;
if Event_occurrence then _f=1;
if first.Event_occurrence then _n=0;
if first.Event_occurrence and Event_occurrence=0 and _f then _i+1;
if _f and Event_occurrence>. then do; _n+(Event_occurrence=0);want_var=1;end;
if Event_occurrence=0 and _f then do;
if t(_i)<3 then want_var=1;
else if t(_i)=3 then want_var=0;
else if t(_i)>3 then want_var=_n<=t(_i)-3;
end;
if _y and Event_occurrence>. then want_var=ifn(year>=y,1,want_var);
output;
end;
call missing(of t(*),of _:);
drop y _:;
run;
if the difference is >3 years, then, the variable (want_var) will equal 1 for all those years up to three years before the next event.
Is there an example of this in the data? If so, which year(s) show what you mean by this?
Greetings,
Overall, looking at the table I attacked, the logic is as follow:
Before the first event for each firm, (before 1995) the variable want_var would equal 0
The first event is in 1995, the next event is in 1996, so since the difference between those two events is only one year, the variable want_var will equal 1 for both those years (1995, and 1996)
The second event and the third event are three years apart ( 3 non event years apart) (1996 to 2000), so the variable want_var will equal 1 for the event year 1996 and 2000 and zero during the those three non event years.
Once again, the next event is only one year apart (in 2001) so want_var=1,
Between the 2001 event and 2003 event there is only one non event year, so want_var=1 for all those years. (2001,2002,2003)
Between the 2003 event year and 2014 event year , there are 10 non event years,
so what I want to achieve in the case of difference>3 of non event years years is having the new variable want_var equal 1 from 2004 until 3 years before the next event which is 2014, so it would equal 1 on 2003 and 2014 , from 2004 until 2010, and then equal 0 in 2011,2012,2013.
Then lastly, if the last event year (2014) is not the last year recorded for that firm (2016), want_var =1 for all consecutive years until the end of data of that specific firm.
i hope what I want to achieve is more clear
Thank you
This is a good explanation of your problem. Thank you.
I spent about 15 minutes on trying to solve this last night, and didn't succeed. I suspect this might be a good example where the hash objects in SAS data steps might help. So I hereby summon our hash experts @novinosrin and @hashman to see if they can come up with a solution, if they have the time and interest.
Thank you Sir @PaigeMiller for the mention, however I am just Expert @hashman wannabe . Hopefully I become like him one day sooner than later. He knows I am striving hard for it. 🙂
Hi @user95 Since your year is continuous, the requirement appears pretty straight forward(unless I missed something very important that I am known for in lacking ATD aka Attention to detail) with a simple Double DOW i.e 1. to Flag and Count 2. To look up. So here you go-
data have;
infile cards expandtabs truncover;
input ID year Event_occurrence;* want_var;
cards;
1004 1992 0 0
1004 1993 0 0
1004 1994 0 0
1004 1995 1 1
1004 1996 1 1
1004 1997 0 0
1004 1998 0 0
1004 1999 0 0
1004 2000 1 1
1004 2001 1 1
1004 2002 0 1
1004 2003 1 1
1004 2004 0 1
1004 2005 0 1
1004 2006 0 1
1004 2007 0 1
1004 2008 0 1
1004 2009 0 1
1004 2010 0 1
1004 2011 0 0
1004 2012 0 0
1004 2013 0 0
1004 2014 1 1
1004 2015 0 1
1004 2016 0 1
;
/*RULES*/
/*Then, if the difference is less than three years then the variable (want_var) will equal 1 for those years*/
/**/
/*if the difference=3 then the variable (want_var) will equal 0 for those years.*/
/**/
/*if the difference is >3 years, then, the variable (want_var) will equal 1 for all those years up to three years before the next event.*/
data want;
do _n=1 by 1 until(last.Event_occurrence);
set have;
by id Event_occurrence notsorted;
retain _f want_var;
if first.id then do;_f=0;want_var=0;end;
if Event_occurrence then _f=1;
end;
do _n_=1 by 1 until(last.Event_occurrence);
set have;
by id Event_occurrence notsorted;
if _f then want_var=1;
if Event_occurrence=0 and _f then do;
if _n<3 then want_var=1;
else if _n=3 then want_var=0;
else if _n>3 then want_var=_n_<=_n-3;
end;
output;
end;
drop _:;
run;
HTH & Best Regards!
Nice solution, @novinosrin. That method did not occur to me.
A wit into performance by not having another "by" in the 2nd pass as to read the records from cache
data want;
do _n=1 by 1 until(last.Event_occurrence);
set have;
by id Event_occurrence notsorted;
retain _f want_var;
if first.id then do;_f=0;want_var=0;end;
if Event_occurrence then _f=1;
end;
do _n_=1 to _n;
set have;
if _f then want_var=1;
if Event_occurrence=0 and _f then do;
if _n<3 then want_var=1;
else if _n=3 then want_var=0;
else if _n>3 then want_var=_n_<=_n-3;
end;
output;
end;
drop _:;
run;
Greetings,
Thank you for taking the time to answer my question,
There is one detail not incorporated into your code and it is as I mentioned before as follow :
"Then lastly, if the last event year (2014) is not the last year recorded for that firm (2016), want_var =1 for all consecutive years until the end of data of that specific firm."
But following your code, it gives me want_var=1 for all remaining year except the last 3 years of the data for that firm.
if the event_occurrence=1 is in 2014 but the last year recorded for that firm is 2020, then want_var for all consecutive years should be=1 until 2020, but in your code want_var=1 from 2014 until 3 years before the end of the data in 2020.
Is there any way to fix it?
Thank you once again!
Hi @user95 In your expected output (you wrote-"what i am looking to achieve") doesn't seem to comply with the below condition right?
"Then lastly, if the last event year (2014) is not the last year recorded for that firm (2016), want_var =1 for all consecutive years until the end of data of that specific firm."
from - until? I am not quite understanding FROM
Greetings,
Yes, I want to make the want_var=1 from the last event occurrence until the end of the data for that firm. So in my example it would be from 2014 until 2016
How can I achieve that?
Thank you once more
Hello @user95 To Make it syntactically less verbose and easier to follow logic, I am doing the final in another step(proc sql). Once you deem we are correct, we shall jump to one step solution if you want.
Test and let me know
data have;
infile cards expandtabs truncover;
input ID year Event_occurrence;* want_var;
cards;
1004 1992 0 0
1004 1993 0 0
1004 1994 0 0
1004 1995 1 1
1004 1996 1 1
1004 1997 0 0
1004 1998 0 0
1004 1999 0 0
1004 2000 1 1
1004 2001 1 1
1004 2002 0 1
1004 2003 1 1
1004 2004 0 1
1004 2005 0 1
1004 2006 0 1
1004 2007 0 1
1004 2008 0 1
1004 2009 0 1
1004 2010 0 1
1004 2011 0 0
1004 2012 0 0
1004 2013 0 0
1004 2014 1 1
1004 2015 0 1
1004 2016 0 1
;
data temp;
do _n=1 by 1 until(last.Event_occurrence);
set have;
by id Event_occurrence notsorted;
retain _f want_var;
if first.id then do;_f=0;want_var=0;end;
if Event_occurrence then _f=1;
end;
do _n_=1 to _n;
set have;
if _f then want_var=1;
if Event_occurrence=0 and _f then do;
if _n<3 then want_var=1;
else if _n=3 then want_var=0;
else if _n>3 then want_var=_n_<=_n-3;
end;
output;
end;
drop _:;
run;
proc sql;
create table want(drop=t:) as
select ID,year,Event_occurrence,max((Event_occurrence=1)*year) as t, max(year) as t1,
ifn(calculated t ne calculated t1 and year>=calculated t,1,want_var) as want_var
from temp
group by id
order by id, year;
quit;
Greetings,
Yes, the last step works as intended, as you mentioned a one step solution would be better, so that I dont end up with a separate dataset that I would need to merge back.
How would it be possible to do it all in one step?
Thank you
Yes sure, just a slight adjustment to not have nested DOW, instead park the count in an array on the first and then look up for the reason max of the year and year of last event_occurrence=1 is fetched at group ID level
data have;
infile cards expandtabs truncover;
input ID year Event_occurrence;* want_var;
cards;
1004 1992 0 0
1004 1993 0 0
1004 1994 0 0
1004 1995 1 1
1004 1996 1 1
1004 1997 0 0
1004 1998 0 0
1004 1999 0 0
1004 2000 1 1
1004 2001 1 1
1004 2002 0 1
1004 2003 1 1
1004 2004 0 1
1004 2005 0 1
1004 2006 0 1
1004 2007 0 1
1004 2008 0 1
1004 2009 0 1
1004 2010 0 1
1004 2011 0 0
1004 2012 0 0
1004 2013 0 0
1004 2014 1 1
1004 2015 0 1
1004 2016 0 1
;
data want;
if 0 then set have;
want_var=0;
do until(last.id);
set have;
by id Event_occurrence notsorted;
array t(999999) _temporary_;
if first.Event_occurrence then _n=0;
if Event_occurrence then do; _f=1;y=year;end;
if _f then _n+(Event_occurrence=0);
if last.Event_occurrence and Event_occurrence=0 and _f then do;
_i+1;
t(_i)=_n;
end;
end;
call missing(of _:);
_y=year ne y;
do until(last.id);
set have;
by id Event_occurrence notsorted;
if Event_occurrence then _f=1;
if first.Event_occurrence then _n=0;
if first.Event_occurrence and Event_occurrence=0 and _f then _i+1;
if _f then do; _n+(Event_occurrence=0);want_var=1;end;
if Event_occurrence=0 and _f then do;
if t(_i)<3 then want_var=1;
else if t(_i)=3 then want_var=0;
else if t(_i)>3 then want_var=_n<=t(_i)-3;
end;
if _y then want_var=ifn(year>=y,1,want_var);
output;
end;
call missing(of t(*),of _:);
drop y _:;
run;
Greetings
Thank you for your reply,
Looking more carefully at my data , I realized that sometimes I had missing values for the event_occurrence variable throughout the years.
Meaning that for the some firm A from 1992 until 2016, there might be a few years of missing values for the event_occurence variable, which your code doesnt account for, and sometimes arbitrarily results in either 0 or 1 for the want_var.
How can I make sure that missing values for event_occurence result in missing values in the want_var variable?
I thought about deleting the missing values before running your code , but somehow it doesnt seem like a good idea.
How do you think I should handle it?
Thank you!
can you plz post a more representative sample of your HAVE and WANT?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.