I have the following dataset:
data have;
input year firm id type;
cards;
2008 28013 1003 1
2008 28013 1004 .
2008 28013 1005 1
2008 28013 1007 .
2008 28013 1009 0
2008 28013 1010 1
2008 28013 1013 1
2008 28013 1053 0
2008 28013 1074 1
2008 28013 1075 .
2009 28332 1003 0
2009 28332 1010 1
2009 28332 1053 .
2009 28332 1074 .
2009 28332 1075 .
2009 28400 1003 0
2009 28400 1085 0
2009 28400 1743 .
;
run;
I also have two datasets: list1 and list2:
data list1;
input id id_adj;
cards;
1001 2323
1001 2325
1003 1009
1003 1010
1003 1085
1004 1005
1004 3456
1005 1004
1005 3888
1007 1823
1009 1003
1010 1003
1013 1053
1013 1074
1053 1013
1074 1013
1075 1743
1075 1744
1085 1003
1743 1075
1744 1075
1823 1007
2323 1001
2325 1001
3456 1004
3888 1005
;
run;
data list2;
input event_year id shock;
cards;
2007 1001 0
2007 1003 0
2007 1004 1
2007 1005 1
2007 1007 0
2007 1009 1
2007 1010 1
2007 1013 0
2007 1053 1
2007 1074 1
2007 1075 0
2007 1085 0
2007 1743 1
2007 1744 1
2007 1823 1
2007 2323 0
2007 2325 0
2007 3456 1
2007 3888 1
2008 1001 1
2008 1003 0
2008 1004 0
2008 1005 1
2008 1007 1
2008 1009 0
2008 1010 1
2008 1013 1
2008 1053 0
2008 1074 0
2008 1075 1
2008 1085 1
2008 1743 1
2008 1744 1
2008 1823 1
2008 2323 1
2008 2325 0
2008 3456 0
2008 3888 0
2009 1001 1
2009 1003 0
2009 1004 0
2009 1005 0
2009 1007 1
2009 1009 1
2009 1010 1
2009 1013 0
2009 1053 0
2009 1074 0
2009 1075 0
2009 1085 1
2009 1743 1
2009 1744 1
2009 1823 1
2009 2323 1
2009 2325 1
2009 3456 0
2009 3888 0
;
run;
I wish to create a variable nd to produce the following dataset:
data want;
input year firm id type nd;
cards;
2008 28013 1003 1 1
2008 28013 1004 . .
2008 28013 1005 1 0
2008 28013 1007 . .
2008 28013 1009 0 0
2008 28013 1010 1 0
2008 28013 1013 1 0
2008 28013 1053 0 1
2008 28013 1074 1 1
2008 28013 1075 . .
2009 28332 1003 0 1
2009 28332 1010 1 0
2009 28332 1053 . .
2009 28332 1074 . .
2009 28332 1075 . .
2009 28400 1003 0 1
2009 28400 1085 0 0
2009 28400 1743 . .
;
run;
where nd is created as follows.
For each nonmissing value of type, start with the id. Then, in “list1”, consider all of the id_adj that corresponds to the id. Out of these id_adj, only keep the ones that appear as an id in “have” for that particular year and firm. For this set of id_adjs, find their corresponding value of shock in “list2” for that particular year. If there is at least one shock = 1, then nd = 1. If every shock=0 then nd = 0. I will provide a few examples:
Example 1: Consider year 2008 and firm 28013. Start with id = 1003. In “list1”, 1003 corresponds to 1009, 1010, and 1085. However, only 1009 and 1010 appear as an id in “have” for the year/firm 2008/28013. Now, in “list2”, in the year 2008, 1009 has shock = 0 and 1010 has shock = 1. Thus, nd = 1.
Example 2: Consider year 2008 and firm 28013. Start with id = 1005. In “list1”, 1005 corresponds to 1004 and 3888. However, only 1004 appears as an id in “have” for the year/firm 2008/28013. Now, in “list2”, in the year 2008, 1004 has shock = 0. Thus, nd = 0.
Example 3: Consider year 2008 and firm 28013. Start with id 1013. In “list1”, 1013 corresponds to 1053 and 1074, both of which appears as an id in “have” for the year/firm 2008/28013. Now, in “list2”, in the year 2008, 1053 has shock = 0 and 1074 has shock = 0. Thus, nd = 0.
@TrueTears you will find your answer here.
it has the same data as you are providing while you may need to make a few small adjustments to fullfill you examples.
Thanks! Yes I had tried to adapt that and spent quite a while doing so but I couldn't quite figure it out because I have two datasets now (list1 and list2) rather than just one list.
Sorry, I am still learning SAS haha
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.