BookmarkSubscribeRSS Feed
aword
Calcite | Level 5

Hello, I have a dataset with more than 20 million observations covering 40 years. Dataset is in below form

date              id1          var1       var2    date1             date2

01012000     14780   3             5        06202000     06292000

07251977     25489    0.8          1.9    09241977     10031977

 

I would like to count the number of observations within each 10-day interval and get the average for all intervals each year except for the dates between date1 and date2. The output should give the average number of observations in all 10-day intervals for each year excluding the ones between date1 and date2. I would be very glad if you could help. Thanks!

6 REPLIES 6
PGStats
Opal | Level 21

Why? You might as well calculate the number of distinct IDs for every 10-day period. That's fewer than 1000/year.

PG
aword
Calcite | Level 5

Thanks for your reply. You are right, I edited the question.

PGStats
Opal | Level 21

A reasonably efficient way to get the distinct id counts for every 10-day period would be:

 


proc sort data=have; by id date; run;

proc sql noprint;
select min(date) - 9, max(date) into :beg, :end
from test;
quit;

data want;
array pr{&beg:&end} _temporary_;
array cnt{&beg:&end} _temporary_;
call missing(of pr{*});
do until(last.id);
    set have end=done; by id;
    do i = date-9 to date;
        pr{i} = 1;
        end;
    end;
do i = &beg to &end;
    if pr{i} then cnt{i} = sum(cnt{i}, 1);
    end;
if done then do;
    do date = &beg + 9 to &end - 9;
        distinctId = coalesce(cnt{date}, 0);
        output;
        end;
    end;
label date="10-day period start date";
keep date distinctId;
run;
PG
aword
Calcite | Level 5

Thanks for your help. Your code did not produce what I want. I am not sure why but the output has lots of zeros which is not expected for nearly any of the 10-day period. What I need is the number of rows in each 10-day period each year excluding the days between date1 and date2. I also need to know which count belongs to which 10-day period. So the output would seem like this for year 1977:

 

10-day period                     count

01.01.1977-01.10.1977      125

01.02.1977-01.11.1977       230

01.03.1977-01.12.1977       195

.

.

.

09.14.1977-09.23.1977      214 (here we exclude date1 09.24.1977- date2 10.03.1977 and jump to 10.04.1977)

10.04.1977-10.13.1977      98

.

.

.

12.22.1977-12.31.1977     144

 

I need this for all years and then I will calculate means and other stuff from here. I appreciate your help.

PGStats
Opal | Level 21

Post what you did.

PG
aword
Calcite | Level 5
I don't have anything useful that would help right now.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 963 views
  • 0 likes
  • 2 in conversation