I have the following dataset (there are more years, but I am just showing the first two years):
data have; input year ID shock; cards; 2000 1001 1 2000 1002 0 2000 1004 0 2000 1006 1 2000 1008 0 2000 1010 0 2000 1011 0 2001 1001 0 2001 1002 0 2001 1004 1 2001 1006 1 2001 1008 1 2001 1010 0 2001 1011 0 ; run;
I have another dataset that lists "adjacent" IDs as follows:
data list; input ID ID_adj; cards; 1001 1002 1001 1006 1002 1001 1004 1008 1004 1011 1006 1010 1008 1004 1008 1011 1010 1006 1010 1011 1011 1004 1011 1008 1011 1010 ; run;
What I wish to do is create the following dataset:
data want; input year ID shock treat $; cards; 2000 1001 1 T 2000 1002 0 A 2000 1004 0 N 2000 1006 1 T 2000 1008 0 N 2000 1010 0 A 2000 1011 0 N 2001 1001 0 A 2001 1002 0 N 2001 1004 1 T 2001 1006 1 T 2001 1008 1 T 2001 1010 0 A 2001 1011 0 A ; run;
The variable treat is defined as follows:
1) If shock takes a value of 1, then treat = T.
2) If shock takes a value of 0, then we look across all of the ID_adj in the list dataset that corresponds to the ID. If the value of shock is 1 for at least one ID_adj in the same year, then treat = A. It's best to illustrate this with an example.
In the original dataset 'have', look at ID 1002 in year 2000. In the dataset 'list', ID 1002 corresponds to 1001 and in year 2000, ID 1001 has shock = 1. Thus, we set treat = A.
Similarly, in the original dataset 'have', look at ID 1010 in year 2000. In the dataset 'list', ID 1010 corresponds to 1006 and 1011. Although ID 1011 has shock = 0 in year 2000, ID 1006 has shock = 1 in year 2000, so treat for ID 1010 = A.
3) If shock takes a value of 0, then we look across all of the ID_adj in the list dataset that corresponds to the ID. If the value of shock is 0 for all ID_adj in the same year, then treat = N. Again, it's best to illustrate with an example:
In the original dataset 'have', look at ID 1008 in year 2000. In the dataset 'list', ID 1008 corresponds to 1004 and 1011. Both ID 1004 and 1011 have shock = 0 in year 2000, so treat = N for ID 1008 in year 2000.
Since the value of shock changes across different years, we need to do the above across all years, thereby producing the dataset 'want'.
Proc SQL is well suited for this operation:
proc sql;
create table want as
select
a.*,
case when a.shock then "T" when max(c.shock) then "A" else "N" end as treat
from
have as a left join
list as b on a.id=b.id left join
have as c on b.id_adj=c.id and a.year=c.year
group by a.year, a.id, a.shock;
quit;
Proc SQL is well suited for this operation:
proc sql;
create table want as
select
a.*,
case when a.shock then "T" when max(c.shock) then "A" else "N" end as treat
from
have as a left join
list as b on a.id=b.id left join
have as c on b.id_adj=c.id and a.year=c.year
group by a.year, a.id, a.shock;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.