BookmarkSubscribeRSS Feed
aword
Calcite | Level 5

Hello, I have a dataset with more than 20 million observations covering 40 years. Dataset is in below form

date              id1          var1       var2    date1             date2

01012000     14780   3             5        06202000     06292000

07251977     25489    0.8          1.9    09241977     10031977

 

I would like to count the number of observations within each 10-day interval and get the average for all intervals each year except for the dates between date1 and date2. The output should give the average number of observations in all 10-day intervals for each year excluding the ones between date1 and date2. I would be very glad if you could help. Thanks!

6 REPLIES 6
PGStats
Opal | Level 21

Why? You might as well calculate the number of distinct IDs for every 10-day period. That's fewer than 1000/year.

PG
aword
Calcite | Level 5

Thanks for your reply. You are right, I edited the question.

PGStats
Opal | Level 21

A reasonably efficient way to get the distinct id counts for every 10-day period would be:

 


proc sort data=have; by id date; run;

proc sql noprint;
select min(date) - 9, max(date) into :beg, :end
from test;
quit;

data want;
array pr{&beg:&end} _temporary_;
array cnt{&beg:&end} _temporary_;
call missing(of pr{*});
do until(last.id);
    set have end=done; by id;
    do i = date-9 to date;
        pr{i} = 1;
        end;
    end;
do i = &beg to &end;
    if pr{i} then cnt{i} = sum(cnt{i}, 1);
    end;
if done then do;
    do date = &beg + 9 to &end - 9;
        distinctId = coalesce(cnt{date}, 0);
        output;
        end;
    end;
label date="10-day period start date";
keep date distinctId;
run;
PG
aword
Calcite | Level 5

Thanks for your help. Your code did not produce what I want. I am not sure why but the output has lots of zeros which is not expected for nearly any of the 10-day period. What I need is the number of rows in each 10-day period each year excluding the days between date1 and date2. I also need to know which count belongs to which 10-day period. So the output would seem like this for year 1977:

 

10-day period                     count

01.01.1977-01.10.1977      125

01.02.1977-01.11.1977       230

01.03.1977-01.12.1977       195

.

.

.

09.14.1977-09.23.1977      214 (here we exclude date1 09.24.1977- date2 10.03.1977 and jump to 10.04.1977)

10.04.1977-10.13.1977      98

.

.

.

12.22.1977-12.31.1977     144

 

I need this for all years and then I will calculate means and other stuff from here. I appreciate your help.

PGStats
Opal | Level 21

Post what you did.

PG
aword
Calcite | Level 5
I don't have anything useful that would help right now.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 604 views
  • 0 likes
  • 2 in conversation