I am trying to grab records from a data set that fit a particular criteria. The Output shown below contains the dates in which I am trying to compare to see if they are within a certain time frame from one another. What I am trying to do is to only output the records in which the date is > 30 days from previously output record (for each group). So in the case below, the final data set would only contain APP's 100,101,102,104,105. Also, the first record per group should always be in the final data set, and there could be 25 records in between 2 final dates so a simple lag or lag2 is out of the picture.
Any suggestions?
Data:
DATA lags ;
INPUT GROUP APP DATE MMDDYY10. ;
FORMAT DATE MMDDYY10.;
DATALINES ;
1 100 04/19/2010
1 101 07/26/2010
1 102 11/08/2010
1 103 11/20/2010
1 104 12/10/2010
2 105 12/06/2012
2 106 12/10/2012
;
RUN ;
Output:
GROUP | APP | DATE |
1 | 100 | 04/19/2010 |
1 | 101 | 07/26/2010 |
1 | 102 | 11/08/2010 |
1 | 103 | 11/20/2010 |
1 | 104 | 12/10/2010 |
2 | 105 | 12/06/2012 |
2 | 106 | 12/10/2012 |
You should retain the previously output date into another variable. Then you can compare the current records DATE value to see if it is more than 30 days after the previously output date.
data want;
set lags;
by group ;
retain prevdate 0;
format prevdate yymmdd10.;
if first.group or (date - prevdate > 30) then do;
output;
prevdate=date;
end;
run;
Those look identical to me, is that sample data or sample output?
I'm sorry for not being clear....
The output I am trying to get should look like this:
GROUP | APP | DATE |
1 | 100 | 04/19/2010 |
1 | 101 | 07/26/2010 |
1 | 102 | 11/08/2010 |
1 | 104 | 12/10/2010 |
2 | 105 | 12/06/2012 |
How does this look?
DATA lags;
INPUT GROUP APP DATE MMDDYY10.;
FORMAT DATE MMDDYY10.;
DATALINES;
1 100 04/19/2010
1 101 07/26/2010
1 102 11/08/2010
1 103 11/20/2010
1 104 12/10/2010
2 105 12/06/2012
2 106 12/10/2012
;
RUN;
proc sort data = lags;
by GROUP APP DATE;
run;
data lags_retain;
set lags;
by GROUP APP DATE;
retain tmp_DATE;
if first.GROUP then
do;
tmp_DATE = DATE;
output;
end;
else
do;
if DATE - tmp_DATE > 30 then
do;
tmp_DATE = DATE;
output;
end;
end;
run;
data lags_retain_out;
set lags_retain (drop = tmp_:);
run;
Matches your output, but you probably haven't put in all the cases you wanted to cover in your sample, so it's hard to check.
Using a DO UNTIL allows for a compact solution to this problem. I didn't sort by APP, as it could disrupt the DATE order within a GROUP.
DATA lags;
INPUT GROUP APP DATE MMDDYY10.;
FORMAT DATE MMDDYY10.;
DATALINES;
1 100 04/19/2010
1 101 07/26/2010
1 102 11/08/2010
1 103 11/20/2010
1 104 12/10/2010
2 105 12/06/2012
2 106 12/10/2012
;
proc sort data = lags;
by GROUP DATE;
run;
data want;
do until (last.GROUP);
set lags; by GROUP;
if missing(previousDATE) or intck("DAY", previousDATE, DATE) > 30 then do;
output;
previousDATE = DATE;
end;
end;
drop previousDATE;
run;
PG
You should retain the previously output date into another variable. Then you can compare the current records DATE value to see if it is more than 30 days after the previously output date.
data want;
set lags;
by group ;
retain prevdate 0;
format prevdate yymmdd10.;
if first.group or (date - prevdate > 30) then do;
output;
prevdate=date;
end;
run;
All the solutions were helpful and i appreciate the input!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.