Hello everyone, I have an excerpt of my data as the following:
DATA have ;
input id year action shock;
DATALINES;
1055 1981 0 .
1055 1982 0 .
1055 1983 0 .
1055 1984 0 .
1055 1985 1 1
1055 1986 1 .
1055 1987 1 .
1055 1988 1 .
1055 1989 0 .
1055 1990 0 .
1055 1991 0 .
1055 1992 0 .
1055 1993 0 .
1085 1981 0 .
1085 1982 1 1
1085 1983 0 .
1085 1984 0 .
1085 1985 0 .
1085 1986 0 .
1085 1987 0 .
1085 1988 0 .
1085 1989 0 .
1085 1990 0 .
1085 1991 1 1
1085 1992 1 .
1085 1993 1 .
1212 1981 0 .
1212 1982 0 .
1212 1983 1 1
1212 1984 1 .
1212 1985 1 .
1212 1986 0 .
1212 1987 0 .
1212 1988 1 1
1212 1989 1 .
1212 1990 1 .
1212 1991 0 .
1212 1992 0 .
1212 1993 0 .
1842 1981 0 .
1842 1982 0 .
1842 1983 0 .
1842 1984 0 .
1842 1985 0 .
1842 1986 0 .
1842 1987 0 .
1842 1988 0 .
1842 1989 0 .
1842 1990 0 .
1842 1991 0 .
1842 1992 0 .
1842 1993 0 .
2913 1981 0 .
2913 1982 0 .
2913 1983 0 .
2913 1984 0 .
2913 1985 1 1
2913 1986 1 .
2913 1987 1 .
2913 1988 0 .
2913 1989 0 .
2913 1990 0 .
2913 1991 1 1
2913 1992 1 .
2913 1993 1 .
;
RUN;
I want to create the following dataset:
DATA want ;
input id event_year treat;
DATALINES;
1055 1985 1
1085 1991 1
1212 1983 1
1212 1988 1
2913 1985 1
2913 1991 1
1085 1985 0
1842 1985 0
1055 1991 0
1842 1991 0
1842 1983 0
1085 1988 0
1842 1988 0
;
RUN;
The want dataset is created as follows:
For each id, consider only shock = 1. If the corresponding value of action equals 1 for the corresponding year and the following two years and equals 0 for the previous two years, then in the want dataset, enter an observation for the id, the corresponding year (named as event_year) and set treat = 1. As an example, consider id = 1055, year = 1985 in have. Here shock = 1, and action = 0 for years 1983 and 1984 (the previous two years), and equals to 1 for years 1985 (the current year corresponding to shock = 1), 1986 and 1987 (the following two years), so in the want dataset, there is an entry for id = 1055, event_year = 1985 with treat = 1.
Once all the rows with treat = 1 are created, then to create the rows with treat = 0, we do the following:
For each treat = 1, we look at all the id's in the have dataset with year equal to the event_year but with action = 0 in that given year. If action also equals to 0 for the two years before and two years after event_year, then enter an observation for the id, the corresponding event_year and set treat = 0 in want. As an example, consider id = 1055, event_year = 1985, treat = 1 in want. In the have dataset, id = 1085 has action = 0 for year = 1985. Action also takes the value of 0 two years before and two years after 1985, so create the observation id = 1085, event_year = 1985 and treat = 0 in want. Similarly, in the have dataset, id = 1842 also has action = 0 for year = 1985 as well as for the two years before and two years after. So create the observation id = 1842, event_year = 1985 and treat = 0 in want.
There is no need to output duplicates in want, so for example, for id = 1055, event_year = 1985 and treat = 1, we already have rows for id = 1085, event_year = 1985, treat = 0, and id = 1842, event_year = 1985, treat = 0. So when we look at id = 2913, event_year = 1985 and treat = 1, this corresponds to exactly the same two rows, so there is no need to output this again in want.
Also, can the code be made flexible so that in the above rule, instead of being two years before and two years after, it can be any arbitrary window around the event_year? For example, one year before to two years after, or 3 years before to 3 years after.
Complicated group wise computations and rule application can be performed in SQL. Although a single very complex SQL statement can perform the work, it is better to separate the major distinct pieces into separate queries that can be debugged and studied independently for testing and performance.
In Proc SQL the groupwise SUM function of a logical expression is very helpful for counting the number of cases affirming or contradicting some rule condition.
Example:
Data (hidden as spoiler)
DATA have ; input id year action shock; DATALINES; 1055 1981 0 . 1055 1982 0 . 1055 1983 0 . 1055 1984 0 . 1055 1985 1 1 1055 1986 1 . 1055 1987 1 . 1055 1988 1 . 1055 1989 0 . 1055 1990 0 . 1055 1991 0 . 1055 1992 0 . 1055 1993 0 . 1085 1981 0 . 1085 1982 1 1 1085 1983 0 . 1085 1984 0 . 1085 1985 0 . 1085 1986 0 . 1085 1987 0 . 1085 1988 0 . 1085 1989 0 . 1085 1990 0 . 1085 1991 1 1 1085 1992 1 . 1085 1993 1 . 1212 1981 0 . 1212 1982 0 . 1212 1983 1 1 1212 1984 1 . 1212 1985 1 . 1212 1986 0 . 1212 1987 0 . 1212 1988 1 1 1212 1989 1 . 1212 1990 1 . 1212 1991 0 . 1212 1992 0 . 1212 1993 0 . 1842 1981 0 . 1842 1982 0 . 1842 1983 0 . 1842 1984 0 . 1842 1985 0 . 1842 1986 0 . 1842 1987 0 . 1842 1988 0 . 1842 1989 0 . 1842 1990 0 . 1842 1991 0 . 1842 1992 0 . 1842 1993 0 . 2913 1981 0 . 2913 1982 0 . 2913 1983 0 . 2913 1984 0 . 2913 1985 1 1 2913 1986 1 . 2913 1987 1 . 2913 1988 0 . 2913 1989 0 . 2913 1990 0 . 2913 1991 1 1 2913 1992 1 . 2913 1993 1 . ;
Code:
%let N_BEFORE = 2; %let N_AFTER = 2; proc sql; create table treated as select shocked.id , shocked.year as event_year , 1 as treat from (select * from have where shock) as shocked join have as range on shocked.id = range.id where range.year between shocked.year - &N_BEFORE and shocked.year + &N_AFTER group by shocked.id, event_year having sum (range.year < shocked.year and range.action = 0) = &N_BEFORE /* count # number rule affirming cases */ & sum (range.year >= shocked.year and range.action = 1) > &N_AFTER ; create table untreated as select unactioned.id , unactioned.year as event_year , 0 as treat from ( select distinct have.* /* distinct very important here */ from have join treated on have.year = treated.event_year where action = 0 ) as unactioned join have as range on unactioned.id = range.id where range.year between unactioned.year - &N_BEFORE and unactioned.year + &N_AFTER and range.action = 0 /* rule is all zeroes */ group by unactioned.id, event_year having count(*) = &N_BEFORE + &N_AFTER + 1 /* count number of zeroes */ ; create table wanted as select * from treated union select * from untreated ;
Complicated group wise computations and rule application can be performed in SQL. Although a single very complex SQL statement can perform the work, it is better to separate the major distinct pieces into separate queries that can be debugged and studied independently for testing and performance.
In Proc SQL the groupwise SUM function of a logical expression is very helpful for counting the number of cases affirming or contradicting some rule condition.
Example:
Data (hidden as spoiler)
DATA have ; input id year action shock; DATALINES; 1055 1981 0 . 1055 1982 0 . 1055 1983 0 . 1055 1984 0 . 1055 1985 1 1 1055 1986 1 . 1055 1987 1 . 1055 1988 1 . 1055 1989 0 . 1055 1990 0 . 1055 1991 0 . 1055 1992 0 . 1055 1993 0 . 1085 1981 0 . 1085 1982 1 1 1085 1983 0 . 1085 1984 0 . 1085 1985 0 . 1085 1986 0 . 1085 1987 0 . 1085 1988 0 . 1085 1989 0 . 1085 1990 0 . 1085 1991 1 1 1085 1992 1 . 1085 1993 1 . 1212 1981 0 . 1212 1982 0 . 1212 1983 1 1 1212 1984 1 . 1212 1985 1 . 1212 1986 0 . 1212 1987 0 . 1212 1988 1 1 1212 1989 1 . 1212 1990 1 . 1212 1991 0 . 1212 1992 0 . 1212 1993 0 . 1842 1981 0 . 1842 1982 0 . 1842 1983 0 . 1842 1984 0 . 1842 1985 0 . 1842 1986 0 . 1842 1987 0 . 1842 1988 0 . 1842 1989 0 . 1842 1990 0 . 1842 1991 0 . 1842 1992 0 . 1842 1993 0 . 2913 1981 0 . 2913 1982 0 . 2913 1983 0 . 2913 1984 0 . 2913 1985 1 1 2913 1986 1 . 2913 1987 1 . 2913 1988 0 . 2913 1989 0 . 2913 1990 0 . 2913 1991 1 1 2913 1992 1 . 2913 1993 1 . ;
Code:
%let N_BEFORE = 2; %let N_AFTER = 2; proc sql; create table treated as select shocked.id , shocked.year as event_year , 1 as treat from (select * from have where shock) as shocked join have as range on shocked.id = range.id where range.year between shocked.year - &N_BEFORE and shocked.year + &N_AFTER group by shocked.id, event_year having sum (range.year < shocked.year and range.action = 0) = &N_BEFORE /* count # number rule affirming cases */ & sum (range.year >= shocked.year and range.action = 1) > &N_AFTER ; create table untreated as select unactioned.id , unactioned.year as event_year , 0 as treat from ( select distinct have.* /* distinct very important here */ from have join treated on have.year = treated.event_year where action = 0 ) as unactioned join have as range on unactioned.id = range.id where range.year between unactioned.year - &N_BEFORE and unactioned.year + &N_AFTER and range.action = 0 /* rule is all zeroes */ group by unactioned.id, event_year having count(*) = &N_BEFORE + &N_AFTER + 1 /* count number of zeroes */ ; create table wanted as select * from treated union select * from untreated ;
Like this?
%let n_before = 2;
%let n_after = 2;
proc sql;
create table MATCH1 as
select unique a.*
from HAVE a
,HAVE b
where a.SHOCK = 1
and a.ACTION = 1
and a.ID = b.ID
and a.YEAR between b.YEAR-&n_before. and b.YEAR+&n_after.
group by a.ID, a.YEAR
having sum(b.ACTION)=3
and sum(b.ACTION*(a.YEAR <= b.YEAR))=3;
proc sql;
create table MATCH2 as
select unique b.ID, a.YEAR as YEAR
from MATCH1 a
,HAVE b
where a.ID ^= b.ID
and b.ACTION = 0
and a.YEAR between b.YEAR-&n_before. and b.YEAR+&n_after.
group by b.ID, a.ID, a.YEAR
having sum(b.ACTION)=0 & count(b.ACTION)=5 order by a.ID , b.id, b.YEAR;
proc sql;
select ID, YEAR, 1 as TREAT from MATCH1
union
select unique *,0 from MATCH2
order by TREAT, ID;
ID | YEAR | TREAT |
---|---|---|
1055 | 1991 | 0 |
1085 | 1985 | 0 |
1085 | 1988 | 0 |
1842 | 1983 | 0 |
1842 | 1985 | 0 |
1842 | 1988 | 0 |
1842 | 1991 | 0 |
1055 | 1985 | 1 |
1085 | 1991 | 1 |
1212 | 1983 | 1 |
1212 | 1988 | 1 |
2913 | 1985 | 1 |
2913 | 1991 | 1 |
SAS seems to have an issue with the first select unique.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.