Hello,
I have a dataset that contains multiple observations for individuals with a date variable that increments by month. It has variables for contraceptive method and birth outcomes. There may be up to 60 observations for each individual (January 2012 - December 2016). A simplified version of my data is structured as below.
UniqueID Date Method Birth
1 01JAN2012 0 0
1 01FEB2012 0 0
1 01MAR2012 0 0
1 01APR2012 0 0
1 01MAY2012 0 1
1 01JUN2012 0 0
1 01JUL2012 0 0
1 01AUG2012 0 0
1 01SEP2012 0 0
1 01OCT2012 0 0
1 01NOV2012 0 0
1 01DEC2012 0 0
1 01JAN2013 0 0
For each date in the original dataset, I want to know if a birth (live or non-live) occurred within 6-10 months later. I created a copy of the original dataset with UniqueID, date, and birth flag to merge back with the original dataset for corresponding months. The original dataset was modified to create five new date variables as shown in the code below.
data test (keep=UniqueID Date: Birth);
set original (obs=500);
array newdate Date_6-Date_10;
do i = 1 to dim(newdate);
k = i + 5;
newdate{i} = intnx('month',date,k);
end;
format date_: date9.;
run;
I was able to accomplish this using a data step with sort procedures within a macro that executed an iterative do loop on a sample dataset with 500 observations. It worked fine, but my real dataset has over 61,000,000 observations. I would like to use PROC SQL to avoid using SORT procedures to reduce processing time, but I am not savvy enough with SQL to accomplish this. Could someone please help? Below is a copy of my code using data steps.
%macro lookforward;
%do i = 6 %to 10;
proc sort data=test;
by orsid date_&i;
run;
data test;
merge test (in=a)
copy (rename=(date=date_&i birth=event_&i));
by UniqueID date_&i;
if a = 1;
event_flag = sum(of event_:);
run;
%end;
%mend;
%lookforward;
The modified dataset would look something like this.
Any assistance in converting this code to SQL is much appreciated.
Thanks,
Ted
@LEINAARE wrote:
Hi @ballardw,
above, with the edition of the event flags shown below (starting from Date_9 to save space).
..... Date_9 Date_10 Event_6 Event_7 Event_8 Event_9 Event_10 Event_Flag
01OCT2012 01NOV2012 0 0 0 0 0 0
01NOV2012 01DEC2012 0 0 0 0 0 0
01DEC2012 01JAN2013 0 0 0 1 0 1
I am trying to use the SQL procedure to produce a table structured similarly to this, but without the redundant sort procedures and data steps. I apologize for not including better information. Thank you for the code you provided.
Ted
Why do you want a table in that format? What is magical about that specific layout.
The code I provided before can get the month difference of the "event" if that is supposed be the month of the pregnancy.
proc sql; create table temp as select a.uniqueid, a.date, a.method,a.birth, b.date as datewithbirth, intck('month',a.date,b.date) as monthdiff from have as a left join (select * from have where birth=1) as b on a.uniqueid=b.uniqueid where 6 le intck('month',a.date,b.date) le 10 order by a.uniqueid, a.date ; quit;
Summaries on the reduced set from above can be merged back with the original. If the monthdiff is missing then "no event".
The above can be transposed to create an across with the "event" value. But since your "events are basically a diagonal of 0/ 1 values with at max a single 1 per date then you really aren't getting any additional information. If the "event_flag" is what you really want:
proc sql; create table want as select a.*, (b.monthdiff>0) as event_flag from have as a left join temp as b on a.uniqueid=b.uniqueid and a.date=b.date ; quit;
Hi @Reeza,
Thanks for responding. I didn't include part of the code, because I wanted to keep it simple, but I use an array to reset values GE one back to one. I am using it as a dichotomous variable for regression analysis. The main thing I am looking for is using SQL to create a table that matches flag values from the copied dataset with the incremented date variables for each observation. I am looking for efficiency, but do not have much experience using SQL.
Thanks,
Ted
Since you didn't even present a birth within that 6 to 10 month interval or what your intermediate steps might have been I'm not really sure what you need.
This is one way to find records where the birth occurs within that time frame though. Note that I edited your example to show a birth on the 01SEP2012 record so that it would be within the time frame.
data have; input UniqueID Date :date9. Method Birth ; format date date9.; datalines; 1 01JAN2012 0 0 1 01FEB2012 0 0 1 01MAR2012 0 0 1 01APR2012 0 0 1 01MAY2012 0 0 1 01JUN2012 0 0 1 01JUL2012 0 0 1 01AUG2012 0 0 1 01SEP2012 0 1 1 01OCT2012 0 0 1 01NOV2012 0 0 1 01DEC2012 0 0 1 01JAN2013 0 0 ; proc sql; create table temp as select a.uniqueid, a.date, a.method,a.birth, b.date as datewithbirth from have as a left join (select * from have where birth=1) as b on a.uniqueid=b.uniqueid where 6 le intck('month',a.date,b.date) le 10 order by a.uniqueid, a.date ; quit;
From the way you present the "newdates" those might be obtainable by transposing the result from above but I'm not quite sure what your process is following. But the above does find all of the months for the same individual where the condition is met as I understand it.
Hi @ballardw,
Thank you for your response. In the data step where I created the dataset named "test", I used the array to create five new date variables (Date_6 - Date_10) as shown below.
UniqueID Date Method Birth Date_6 Date_7 Date_8 Date_9 Date_10
1 01JAN2012 0 0 01JUL2012 01AUG2012 01SEP2012 01OCT2012 01NOV2012
1 01FEB2012 0 0 01AUG2012 01SEP2012 01OCT2012 01NOV2012 01DEC2012
1 01MAR2012 0 0 01SEP2012 01OCT2012 01NOV2012 01DEC2012 01JAN2013
.
.
.
.
1 01DEC2012 0 0 01JUN2013 01JUL2013 01AUG2013 01SEP2013 01OCT2013
The macro program containing a SORT procedure with a data step that merged the 'test' and 'copy' datasets created five new flag variables (event_6 - event_10) that were summed to create one variable indicating whether or not a birth occurred on at least one of the future dates specified for each observation. The dataset produced by the macro program included all of the variables shown above, with the edition of the event flags shown below (starting from Date_9 to save space).
..... Date_9 Date_10 Event_6 Event_7 Event_8 Event_9 Event_10 Event_Flag
01OCT2012 01NOV2012 0 0 0 0 0 0
01NOV2012 01DEC2012 0 0 0 0 0 0
01DEC2012 01JAN2013 0 0 0 1 0 1
I am trying to use the SQL procedure to produce a table structured similarly to this, but without the redundant sort procedures and data steps. I apologize for not including better information. Thank you for the code you provided.
Ted
@LEINAARE wrote:
Hi @ballardw,
above, with the edition of the event flags shown below (starting from Date_9 to save space).
..... Date_9 Date_10 Event_6 Event_7 Event_8 Event_9 Event_10 Event_Flag
01OCT2012 01NOV2012 0 0 0 0 0 0
01NOV2012 01DEC2012 0 0 0 0 0 0
01DEC2012 01JAN2013 0 0 0 1 0 1
I am trying to use the SQL procedure to produce a table structured similarly to this, but without the redundant sort procedures and data steps. I apologize for not including better information. Thank you for the code you provided.
Ted
Why do you want a table in that format? What is magical about that specific layout.
The code I provided before can get the month difference of the "event" if that is supposed be the month of the pregnancy.
proc sql; create table temp as select a.uniqueid, a.date, a.method,a.birth, b.date as datewithbirth, intck('month',a.date,b.date) as monthdiff from have as a left join (select * from have where birth=1) as b on a.uniqueid=b.uniqueid where 6 le intck('month',a.date,b.date) le 10 order by a.uniqueid, a.date ; quit;
Summaries on the reduced set from above can be merged back with the original. If the monthdiff is missing then "no event".
The above can be transposed to create an across with the "event" value. But since your "events are basically a diagonal of 0/ 1 values with at max a single 1 per date then you really aren't getting any additional information. If the "event_flag" is what you really want:
proc sql; create table want as select a.*, (b.monthdiff>0) as event_flag from have as a left join temp as b on a.uniqueid=b.uniqueid and a.date=b.date ; quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.