BookmarkSubscribeRSS Feed
MScEpi
Calcite | Level 5
I have a SAS data set with some 2 million records.

The records span a period of 5 years.

Many records are representative of the same person.

Each record has a variable for a service date.

I need to find out how many people have records with more than one service date in a seven day period, as well as how many people have multiple records in a one year period.

How can I do this? Would I use a Do loop command?

thanks!
9 REPLIES 9
art297
Opal | Level 21
Presuming you have already cleaned your data to ensure that you can accurately identify individuals who have multiple records, then you could get what you indicate with something like the following:
[pre]
data have;
input id date date9.;
cards;
1 10jan2005
1 17jan2005
2 10jan2005
2 16jan2005
2 9jan2005
3 5feb2006
3 5feb2007
4 5mar2008
4 4mar2009
5 7apr2010
;
proc sort data=have;
by id date;
run;
data want;
set have;
retain last met_condition;
by id;
last=lag(date);
if first.id then call missing(met_condition);
if not (first.id and last.id) then do;
if date-lag(date) le 7 and not first.date then met_condition=1;
if missing(met_condition) and
date-lag(date) le 365 and
not first.date then met_condition=2;
end;
if last.id then output;
run;
proc freq data=want;
tables met_condition;
run;
[/pre]
Using something like the above those who end up with a value of 1 for met_condition meet both of your conditions, those who have a value of 2 for met_condition meet your second condition, and those who have a missing value for met_condition didn't meet either condition.

HTH,
Art

> I have a SAS data set with some 2 million records.
>
> The records span a period of 5 years.
>
> Many records are representative of the same person.
>
> Each record has a variable for a service date.
>
> I need to find out how many people have records with
> more than one service date in a seven day period, as
> well as how many people have multiple records in a
> one year period.
>
> How can I do this? Would I use a Do loop command?
>
> thanks!
MScEpi
Calcite | Level 5
Thanks, I will try the code that you have suggested.

This will take into account that it only matters if the repeated events in those time frames are for the same unique person identifier?

Also, I am not quite clear how you came up with the dates that you listed in the code. The time frames that I am looking at start April 1, 2004 - 2008, with each year ending March 31, 2005-2009. How would I alter the code to accommodate those dates?

Thanks again!
art297
Opal | Level 21
Since you didn't provide any sample data I just invented some.

You didn't indicate anything about having to account for time frames (at least that is how I read your request), simply that you:

1. had data that covered a five year span and were interested in finding out
2. how many people (ids in my example) had multiple records within any 7 day period and
3. how many peopld had multiple records within any 365 day period.

That is what the suggested code was intended to do.

Art
---------
> Thanks, I will try the code that you have suggested.
>
>
> This will take into account that it only matters if
> the repeated events in those time frames are for the
> same unique person identifier?
>
> Also, I am not quite clear how you came up with the
> dates that you listed in the code. The time frames
> that I am looking at start April 1, 2004 - 2008, with
> each year ending March 31, 2005-2009. How would I
> alter the code to accommodate those dates?
>
> Thanks again!
Ksharp
Super User
Just as Art.T said.You need to post some dummy data and what you want your output looks like and explain what is your logic detailly. That will be helpful to get your answer.


Ksharp
ballardw
Super User
Also, are your seven day periods any arbitrary seven consecutive days or with in a calendar week such as a Sunday to Saturday period?
MScEpi
Calcite | Level 5
The seven days need do not represent a week, or a standard Sunday-Monday, rather just if the two service dates are within 7 days of one another; or if the service dates are within the same 'year' from April 1st (2004-2008) and March 31 (2005-2009).

The important thing to find out is if/how many individual people (based on their personal id) ansd occurences there are where there is more than one service in a seven day period, or in those 'years'.

So it doesn't matter if you and I both had service dates within 7 days from one another, rather if I had more thatn one visit.

I have done a first.id and last.id to see how many ids show up multiple times, now I just have to figure out how to check for the occurences in a week, or within the 'year'.
art297
Opal | Level 21
You added some extra info, and I noticed some errors in my original suggestion, but I'm still not sure if this is what you want:
[pre]
data have;
input id date date9.;
cards;
1 10jan2005
1 17jan2005
2 10jan2005
2 16jan2005
2 9jan2005
3 5feb2006
3 5feb2007
4 5apr2008
4 4mar2009
5 7apr2010
;

proc sort data=have;
by id date;
run;

data want;
set have;
retain last met_condition_7day met_condition_year;
by id;
last=lag(date);
if first.id then do;
call missing(met_condition_7day);
call missing(met_condition_year);
end;
else do;
if date-last le 6 then met_condition_7day=1;
if date-last le 364 and
date le mdy(3,31,year(last)+1)
then met_condition_year=1;
end;
if last.id then output;
run;

proc freq data=want;
tables met:;
run;
[/pre]
HTH,
Art
> The seven days need do not represent a week, or a
> standard Sunday-Monday, rather just if the two
> service dates are within 7 days of one another; or if
> the service dates are within the same 'year' from
> April 1st (2004-2008) and March 31 (2005-2009).
>
> The important thing to find out is if/how many
> individual people (based on their personal id) ansd
> occurences there are where there is more than one
> service in a seven day period, or in those 'years'.
>
> So it doesn't matter if you and I both had service
> dates within 7 days from one another, rather if I had
> more thatn one visit.
>
> I have done a first.id and last.id to see how many
> ids show up multiple times, now I just have to figure
> out how to check for the occurences in a week, or
> within the 'year'.
Ksharp
Super User
[pre]
data have;
input id date date9.;
format date date9.;
cards;
1 10jan2005
1 17jan2005
2 10jan2005
2 16jan2005
2 9jan2005
3 5feb2006
3 5feb2007
4 5mar2008
4 4mar2009
5 7apr2010
;
run;
proc sql ;
select sum(HowMany) as want_sum
from(
select count(distinct a.id) as HowMany
from have as a,have as b
where a.id=b.id and a.date le b.date and b.date le a.date+7
group by a.id,a.date
having count(distinct b.date) ge 2)
;
quit;
[/pre]

Ksharp Message was edited by: Ksharp
Peter_C
Rhodochrosite | Level 12
here is a step to check and flag those ID where multiple events occur within 7 days or the "frame year";[pre]data have;
input id date date9.;
frameY = intnx( 'year.4', date, 0) ;
format date framey date9. id 5. ;
cards;
1 10jan2005
1 17jan2005
2 10jan2005
2 16jan2005
2 9jan2005
3 5feb2006
3 5feb2007
4 5apr2008
4 4mar2009
5 7apr2010
;
proc sort out= work ;
by id date ;
run ;

data specials( keep= id frameY flag7day flagNyear) ;
frame = 0 ;
do until( last.id ) ;
set work ;
by id frameY ;
if not frame then frame = frameY ;
if first.frameY then do;
cases = 0 ;
framed = framey ;
end ;
cases+1 ;
if cases > 1 and not flag7day then do ;
if date - framed > 7 then framed = date ;
else flag7day = 1 ;
end ;

if last.frameY and cases > 1 then
flagNyear=1 ;
end ;
run ;[/pre]looks a bit similar to Art's suggestion

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1545 views
  • 0 likes
  • 5 in conversation