I am trying to grab records from a data set that fit a particular criteria. The Output shown below contains the dates in which I am trying to compare to see if they are within a certain time frame from one another. What I am trying to do is to only output the records in which the date is > 30 days from previously output record (for each group). So in the case below, the final data set would only contain APP's 100,101,102,104,105. Also, the first record per group should always be in the final data set, and there could be 25 records in between 2 final dates so a simple lag or lag2 is out of the picture.
Any suggestions?
Data:
DATA lags ;
INPUT GROUP APP DATE MMDDYY10. ;
FORMAT DATE MMDDYY10.;
DATALINES ;
1 100 04/19/2010
1 101 07/26/2010
1 102 11/08/2010
1 103 11/20/2010
1 104 12/10/2010
2 105 12/06/2012
2 106 12/10/2012
;
RUN ;
Output:
GROUP | APP | DATE |
1 | 100 | 04/19/2010 |
1 | 101 | 07/26/2010 |
1 | 102 | 11/08/2010 |
1 | 103 | 11/20/2010 |
1 | 104 | 12/10/2010 |
2 | 105 | 12/06/2012 |
2 | 106 | 12/10/2012 |
You should retain the previously output date into another variable. Then you can compare the current records DATE value to see if it is more than 30 days after the previously output date.
data want;
set lags;
by group ;
retain prevdate 0;
format prevdate yymmdd10.;
if first.group or (date - prevdate > 30) then do;
output;
prevdate=date;
end;
run;
Those look identical to me, is that sample data or sample output?
I'm sorry for not being clear....
The output I am trying to get should look like this:
GROUP | APP | DATE |
1 | 100 | 04/19/2010 |
1 | 101 | 07/26/2010 |
1 | 102 | 11/08/2010 |
1 | 104 | 12/10/2010 |
2 | 105 | 12/06/2012 |
How does this look?
DATA lags;
INPUT GROUP APP DATE MMDDYY10.;
FORMAT DATE MMDDYY10.;
DATALINES;
1 100 04/19/2010
1 101 07/26/2010
1 102 11/08/2010
1 103 11/20/2010
1 104 12/10/2010
2 105 12/06/2012
2 106 12/10/2012
;
RUN;
proc sort data = lags;
by GROUP APP DATE;
run;
data lags_retain;
set lags;
by GROUP APP DATE;
retain tmp_DATE;
if first.GROUP then
do;
tmp_DATE = DATE;
output;
end;
else
do;
if DATE - tmp_DATE > 30 then
do;
tmp_DATE = DATE;
output;
end;
end;
run;
data lags_retain_out;
set lags_retain (drop = tmp_:);
run;
Matches your output, but you probably haven't put in all the cases you wanted to cover in your sample, so it's hard to check.
Using a DO UNTIL allows for a compact solution to this problem. I didn't sort by APP, as it could disrupt the DATE order within a GROUP.
DATA lags;
INPUT GROUP APP DATE MMDDYY10.;
FORMAT DATE MMDDYY10.;
DATALINES;
1 100 04/19/2010
1 101 07/26/2010
1 102 11/08/2010
1 103 11/20/2010
1 104 12/10/2010
2 105 12/06/2012
2 106 12/10/2012
;
proc sort data = lags;
by GROUP DATE;
run;
data want;
do until (last.GROUP);
set lags; by GROUP;
if missing(previousDATE) or intck("DAY", previousDATE, DATE) > 30 then do;
output;
previousDATE = DATE;
end;
end;
drop previousDATE;
run;
PG
You should retain the previously output date into another variable. Then you can compare the current records DATE value to see if it is more than 30 days after the previously output date.
data want;
set lags;
by group ;
retain prevdate 0;
format prevdate yymmdd10.;
if first.group or (date - prevdate > 30) then do;
output;
prevdate=date;
end;
run;
All the solutions were helpful and i appreciate the input!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.