Count number of observations in 10-day periods

Reply
New Contributor
Posts: 4

Count number of observations in 10-day periods

[ Edited ]

Hello, I have a dataset with more than 20 million observations covering 40 years. Dataset is in below form

date              id1          var1       var2    date1             date2

01012000     14780   3             5        06202000     06292000

07251977     25489    0.8          1.9    09241977     10031977

 

I would like to count the number of observations within each 10-day interval and get the average for all intervals each year except for the dates between date1 and date2. The output should give the average number of observations in all 10-day intervals for each year excluding the ones between date1 and date2. I would be very glad if you could help. Thanks!

Esteemed Advisor
Posts: 5,624

Re: Randomly select 1000 10-day periods

Why? You might as well calculate the number of distinct IDs for every 10-day period. That's fewer than 1000/year.

PG
New Contributor
Posts: 4

Re: Randomly select 1000 10-day periods

Thanks for your reply. You are right, I edited the question.

Esteemed Advisor
Posts: 5,624

Re: Randomly select 1000 10-day periods

A reasonably efficient way to get the distinct id counts for every 10-day period would be:

 


proc sort data=have; by id date; run;

proc sql noprint;
select min(date) - 9, max(date) into :beg, :end
from test;
quit;

data want;
array pr{&beg:&end} _temporary_;
array cnt{&beg:&end} _temporary_;
call missing(of pr{*});
do until(last.id);
    set have end=done; by id;
    do i = date-9 to date;
        pr{i} = 1;
        end;
    end;
do i = &beg to &end;
    if pr{i} then cnt{i} = sum(cnt{i}, 1);
    end;
if done then do;
    do date = &beg + 9 to &end - 9;
        distinctId = coalesce(cnt{date}, 0);
        output;
        end;
    end;
label date="10-day period start date";
keep date distinctId;
run;
PG
New Contributor
Posts: 4

Re: Randomly select 1000 10-day periods

[ Edited ]

Thanks for your help. Your code did not produce what I want. I am not sure why but the output has lots of zeros which is not expected for nearly any of the 10-day period. What I need is the number of rows in each 10-day period each year excluding the days between date1 and date2. I also need to know which count belongs to which 10-day period. So the output would seem like this for year 1977:

 

10-day period                     count

01.01.1977-01.10.1977      125

01.02.1977-01.11.1977       230

01.03.1977-01.12.1977       195

.

.

.

09.14.1977-09.23.1977      214 (here we exclude date1 09.24.1977- date2 10.03.1977 and jump to 10.04.1977)

10.04.1977-10.13.1977      98

.

.

.

12.22.1977-12.31.1977     144

 

I need this for all years and then I will calculate means and other stuff from here. I appreciate your help.

Esteemed Advisor
Posts: 5,624

Re: Randomly select 1000 10-day periods

Post what you did.

PG
New Contributor
Posts: 4

Re: Randomly select 1000 10-day periods

I don't have anything useful that would help right now.
Ask a Question
Discussion stats
  • 6 replies
  • 161 views
  • 0 likes
  • 2 in conversation