Hello, I have a dataset with more than 20 million observations covering 40 years. Dataset is in below form
date id1 var1 var2 date1 date2
01012000 14780 3 5 06202000 06292000
07251977 25489 0.8 1.9 09241977 10031977
I would like to count the number of observations within each 10-day interval and get the average for all intervals each year except for the dates between date1 and date2. The output should give the average number of observations in all 10-day intervals for each year excluding the ones between date1 and date2. I would be very glad if you could help. Thanks!
Why? You might as well calculate the number of distinct IDs for every 10-day period. That's fewer than 1000/year.
Thanks for your reply. You are right, I edited the question.
A reasonably efficient way to get the distinct id counts for every 10-day period would be:
proc sort data=have; by id date; run;
proc sql noprint;
select min(date) - 9, max(date) into :beg, :end
from test;
quit;
data want;
array pr{&beg:&end} _temporary_;
array cnt{&beg:&end} _temporary_;
call missing(of pr{*});
do until(last.id);
set have end=done; by id;
do i = date-9 to date;
pr{i} = 1;
end;
end;
do i = &beg to &end;
if pr{i} then cnt{i} = sum(cnt{i}, 1);
end;
if done then do;
do date = &beg + 9 to &end - 9;
distinctId = coalesce(cnt{date}, 0);
output;
end;
end;
label date="10-day period start date";
keep date distinctId;
run;
Thanks for your help. Your code did not produce what I want. I am not sure why but the output has lots of zeros which is not expected for nearly any of the 10-day period. What I need is the number of rows in each 10-day period each year excluding the days between date1 and date2. I also need to know which count belongs to which 10-day period. So the output would seem like this for year 1977:
10-day period count
01.01.1977-01.10.1977 125
01.02.1977-01.11.1977 230
01.03.1977-01.12.1977 195
.
.
.
09.14.1977-09.23.1977 214 (here we exclude date1 09.24.1977- date2 10.03.1977 and jump to 10.04.1977)
10.04.1977-10.13.1977 98
.
.
.
12.22.1977-12.31.1977 144
I need this for all years and then I will calculate means and other stuff from here. I appreciate your help.
Post what you did.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.