Re: Creating a variable based on two other datasets

TrueTears · Posted 12-06-2019 02:04 PM

I have the following dataset:

data have;
input year firm id type;
cards;
2008	28013	1003	1
2008	28013	1004	.
2008	28013	1005	1
2008	28013	1007	.
2008	28013	1009	0
2008	28013	1010	1
2008	28013	1013	1
2008	28013	1053	0
2008	28013	1074	1
2008	28013	1075	.
2009	28332	1003	0
2009	28332	1010	1
2009	28332	1053	.
2009	28332	1074	.
2009	28332	1075	.
2009	28400	1003	0
2009	28400	1085	0
2009	28400	1743	.
;
run;

I also have two datasets: list1 and list2:

data list1;
input id id_adj;
cards;
1001	2323
1001	2325
1003	1009
1003	1010
1003	1085
1004	1005
1004	3456
1005	1004
1005	3888
1007	1823
1009	1003
1010	1003
1013	1053
1013	1074
1053	1013
1074	1013
1075	1743
1075	1744
1085	1003
1743	1075
1744	1075
1823	1007
2323	1001
2325	1001
3456	1004
3888	1005

;
run;

data list2;
input event_year id shock;
cards;
2007	1001	0
2007	1003	0
2007	1004	1
2007	1005	1
2007	1007	0
2007	1009	1
2007	1010	1
2007	1013	0
2007	1053	1
2007	1074	1
2007	1075	0
2007	1085	0
2007	1743	1
2007	1744	1
2007	1823	1
2007	2323	0
2007	2325	0
2007	3456	1
2007	3888	1
2008	1001	1
2008	1003	0
2008	1004	0
2008	1005	1
2008	1007	1
2008	1009	0
2008	1010	1
2008	1013	1
2008	1053	0
2008	1074	0
2008	1075	1
2008	1085	1
2008	1743	1
2008	1744	1
2008	1823	1
2008	2323	1
2008	2325	0
2008	3456	0
2008	3888	0
2009	1001	1
2009	1003	0
2009	1004	0
2009	1005	0
2009	1007	1
2009	1009	1
2009	1010	1
2009	1013	0
2009	1053	0
2009	1074	0
2009	1075	0
2009	1085	1
2009	1743	1
2009	1744	1
2009	1823	1
2009	2323	1
2009	2325	1
2009	3456	0
2009	3888	0
;
run;

I wish to create a variable nd to produce the following dataset:

data want;
input year firm id type nd;
cards;
2008	28013	1003	1	1
2008	28013	1004	.	.
2008	28013	1005	1	0
2008	28013	1007	.	.
2008	28013	1009	0	0
2008	28013	1010	1	0
2008	28013	1013	1	0
2008	28013	1053	0	1
2008	28013	1074	1	1
2008	28013	1075	.	.
2009	28332	1003	0	1
2009	28332	1010	1	0
2009	28332	1053	.	.
2009	28332	1074	.	.
2009	28332	1075	.	.
2009	28400	1003	0	1
2009	28400	1085	0	0
2009	28400	1743	.	.
;
run;

where nd is created as follows.

For each nonmissing value of type, start with the id. Then, in “list1”, consider all of the id_adj that corresponds to the id. Out of these id_adj, only keep the ones that appear as an id in “have” for that particular year and firm. For this set of id_adjs, find their corresponding value of shock in “list2” for that particular year. If there is at least one shock = 1, then nd = 1. If every shock=0 then nd = 0. I will provide a few examples:

Example 1: Consider year 2008 and firm 28013. Start with id = 1003. In “list1”, 1003 corresponds to 1009, 1010, and 1085. However, only 1009 and 1010 appear as an id in “have” for the year/firm 2008/28013. Now, in “list2”, in the year 2008, 1009 has shock = 0 and 1010 has shock = 1. Thus, nd = 1.

Example 2: Consider year 2008 and firm 28013. Start with id = 1005. In “list1”, 1005 corresponds to 1004 and 3888. However, only 1004 appears as an id in “have” for the year/firm 2008/28013. Now, in “list2”, in the year 2008, 1004 has shock = 0. Thus, nd = 0.

Example 3: Consider year 2008 and firm 28013. Start with id 1013. In “list1”, 1013 corresponds to 1053 and 1074, both of which appears as an id in “have” for the year/firm 2008/28013. Now, in “list2”, in the year 2008, 1053 has shock = 0 and 1074 has shock = 0. Thus, nd = 0.

VDD · Posted 12-06-2019 02:44 PM

@TrueTears you will find your answer here.

https://communities.sas.com/t5/SAS-Programming/Creating-a-variable-based-on-another-dataset/m-p/6089...

it has the same data as you are providing while you may need to make a few small adjustments to fullfill you examples.

TrueTears · Posted 12-06-2019 02:46 PM

Thanks! Yes I had tried to adapt that and spent quite a while doing so but I couldn't quite figure it out because I have two datasets now (list1 and list2) rather than just one list.

Sorry, I am still learning SAS haha

Reeza · Posted 12-06-2019 02:56 PM

Post what you have so far then and we can help you out with it from there.

Creating a variable based on two other datasets