I have dataset with three varaibles id startdate enddate. Dates are numerical sas formats.
I would like to pull out the records for id where the difference between the two succesive START dates is more then 28 days. There will be few missing dates too.
example:
ID STARTDATE ENDDATE
001 19583 19604
001 19589 19609
001 19600 19610
001 19628 19638
001 19480 19520
002 . .
002 19620 19624
002 19630 19634
change if condition to
if id=lag(id) and startdate-lag(startdate)>=28;
Hi Try this,
data test;
input ID $3. STARTDATE ENDDATE;
if startdate-lag(startdate)>=28;
cards;
001 19583 19604
001 19589 19609
001 19600 19610
001 19628 19638
001 19480 19520
002 . .
002 19620 19624
002 19630 19634
;
run;
output:
001 19628 19638
->use sum function if you want dates after missing dates to be displayed
if sum(startdate,-lag(startdate))>=28;
Thanks for your reply. It answered my query partially but I would like to calculate diff between dates for ID Specific. The logic you gave is working but it working across all the id's and it outputting even when the difference between two dates is greater then 28 for two different id.
So, using the LAG function, you test to see if the ID of the current observation matches the ID of the previous observation, and also using the LAG function test to see if the dates are greater than 28.
change if condition to
if id=lag(id) and startdate-lag(startdate)>=28;
Thanks....its working.....is there any other option....we can use like(Group by ID as we do SQL and use lag function ) ......my query is more about finding the discrepancies of id's with dates greater then 28.....any query with group by id and using the lag function would be prefect fit for me......
rakeshvvv wrote:
Thanks....its working.....is there any other option....we can use like(Group by ID as we do SQL and use lag function ) ......my query is more about finding the discrepancies of id's with dates greater then 28.....any query with group by id and using the lag function would be prefect fit for me......
I don't understand what you are asking for ... or how it is different than what we have already explained ... an example would definitely help. It also sounds like (although you don't specifically state this) that you are asking for a PROC SQL solution. Are you asking for a PROC SQL solution? If so, why does it have to be PROC SQL when a data step works perfectly well?
Hi,
I have data set with ID and with multiple VISITDATES......my query should output the records where difference between two successive visit dates is greater then 28 within a ID. hope it helps......
I have data set with ID and with multiple VISITDATES......my query should output the records where difference between two successive visit dates is greater then 28 within a ID. hope it helps......
But that's what the code above has provided
This is similar to a 30 day readmission problem in medical studies. Search on here and you'll find SQL solutions to that effect.
It's a common request/issue.
If you are saying that you only want one record per ID, just to identify the offending IDs, this would be a way. It assumes your SAS data set is sorted by ID STARTDATE:
data want;
do until (last.id);
prior_start = startdate;
set have;
by id;
if first.id=0 and startdate - prior_start >= 28 then wanted='Y';
end;
if wanted='Y';
keep id;
run;
SQL would likely not be an option to accomplish this, since it doesn't guarantee the order of the incoming records.
how you want your output data to look like...
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.