BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
twildone
Pyrite | Level 9

Hi...I have 2 datasets. The first has a Start and End Date for each ID. The second dataset has a list of dates. What I an trying to do is count the number of Dates listed in dataset have2 for each ID in dataset have that is between the Start and End dates.Any suggestions. Thanks

 

 

data have;
input ID $ (Start End) (:yymmdd10.);
format Start End yymmdd10.;
cards;
1 20190519 20190601
2 20190306 20190324
3 20190822 20190830
;

data have2;
input Date :yymmdd10.;
format Date yymmdd10.;
cards;
20190522
20190528
20190615
20190820
20190823
20190827
;

data want;
input ID $ (Start End) (:yymmdd10.) NumberDays;
format Start End yymmdd10.;
cards;
1 20190519 20190601 2
2 20190306 20190324 0
3 20190822 20190830 2
;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

So, because of the size of the cartesian join, we need to forego the obvious SQL solution and use some data step trickery to do it in a sequential pass, by using an array indexed by dates:

data have;
input ID $ (Start End) (:yymmdd10.);
format Start End yymmdd10.;
cards;
1 20190519 20190601
2 20190306 20190324
3 20190822 20190830
;

data have2;
input Date :yymmdd10.;
format Date yymmdd10.;
cards;
20190522
20190528
20190615
20190820
20190823
20190827
;

%let array_start = %sysfunc(inputn(19000101,yymmdd8.));
%let array_end = %sysfunc(inputn(21001231,yymmdd8.));

data want;
set have;
if _n_ = 1
then do;
  array dates {&array_start.:&array_end.} _temporary_;
  do while (not done);
    set have2 end=done;
    dates{date} = 1;
  end;
end;
count = 0;
do _n_ = start to end;
  count = sum(count,dates{_n_});
end;
drop date;
run;

View solution in original post

20 REPLIES 20
PaigeMiller
Diamond | Level 26
proc transpose data=have2 out=have2_t;
run;

data want;
    if _n_=1 then set have2_t;
    set have;
    array c col:;
    count=0;
    do i=1 to dim(c);
        if start<c(i)<end then count=count+1;
    end;
    keep start end id count;
run;
--
Paige Miller
jimbarbour
Meteorite | Level 14

@twildone wrote:

Hi...I have 2 datasets. The first has a Start and End Date for each ID. The second dataset has a list of dates. What I an trying to do is count the number of Dates listed in dataset have2 for each ID in dataset have that is between the Start and End dates.Any suggestions. Thanks

 

 

data have;
input ID $ (Start End) (:yymmdd10.);
format Start End yymmdd10.;
cards;
1 20190519 20190601
2 20190306 20190324
3 20190822 20190830
;

data have2;
input Date :yymmdd10.;
format Date yymmdd10.;
cards;
20190522
20190528
20190615
20190820
20190823
20190827
;

data want;
input ID $ (Start End) (:yymmdd10.) NumberDays;
format Start End yymmdd10.;
cards;
1 20190519 20190601 2
2 20190306 20190324 0
3 20190822 20190830 2
;

Since you are reading in these dates as SAS dates, they will be a number (the count of days since January 1, 1960), and you can just use a regular mathematical expression.    I think you'll want two SET statements, one inside the other.

The mathematical expression would be something fairly simple like:

IF Start <= Date <= End THEN
    Count + 1;

Why don't you experiment with that code and two SET statements and see if you can get it working?

 

Jim

singhsahab
Lapis Lazuli | Level 10

@twildone Here is an alternate solution with the help PROC SQL and Data step.

 


proc sql noprint;
create table have_3 as
select a.*,b.date from have as a, have2 as b where date between start and end order by ID;
quit;

data have_4;
set have_3;
by Id;
if first.id then NumberDays=1;
else NumberDays+1;
if last.id;
drop date;
run;

proc sql;
create table want as 
select id,start,end,NumberDays from have_4
union all
select id,start,end ,0 as NumberDays from Have where id not in (select distinct(Id) from have_3)
order by id;
quit;

Thanks:

 

 

Kurt_Bremser
Super User
proc sql;
create table want as
  select
    a.id,
    a.start,
    a.end,
    count(b.date) as count
  from have a left join have2 b
  on a.start le b.date le a.end
  group by a.id, a.start, a.end;
;
quit;

Untested, posted from my tablet.

twildone
Pyrite | Level 9

Hi....I was trying to use a macro to loop through the list but I get an error message. Thanks.

%macro Student();
Proc sql;
    Select distinct Date into :Dates separated by ' ' from have2;
Quit;

%do i=1 %to %sysfunc(countw(&Dates));
    %let Dates=%scan(&Dates,&i,%str( ));
data want;
    set have;
	by ID;
		if Start <= &Dates <= End then
    		Count + 1;
	output;
	run;

%end;
%mend;
%Student
PaigeMiller
Diamond | Level 26

@twildone wrote:

Hi....I was trying to use a macro to loop through the list but I get an error message.


What error? Show us the LOG.

 

Why add in macro complications, when working solutions have been posted above?

--
Paige Miller
twildone
Pyrite | Level 9

Hi.....I have included the log below with the error message. Thanks

 

27         
28         GOPTIONS ACCESSIBLE;
29         data have;
30         input ID $ (Start End) (:yymmdd10.);
31         format Start End yymmdd10.;
32         cards;

NOTE: The data set WORK.HAVE has 3 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
      
36         ;

37         
38         data have2;
39         input Date :yymmdd10.;
40         format Date yymmdd10.;
41         cards;

NOTE: The data set WORK.HAVE2 has 6 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
      
48         ;

49         
50         %macro Student();
51         Proc sql;
52             Select distinct Date into :Dates separated by ' ' from have2;
2                                                          The SAS System                                 11:33 Sunday, June 6, 2021

53         Quit;
54         
55         %do i=1 %to %sysfunc(countw(&Dates));
56             %let Dates=%scan(&Dates,&i,%str( ));
57         data want;
58             set have;
59         	by ID;
60         		if Start le &Dates le End then
61         			Count+1;
62         	output;
63         	run;
64         
65         %end;
66         %mend;
67         %Student
68         
69         GOPTIONS NOACCESSIBLE;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds
      


NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
      

NOTE: Line generated by the invoked macro "STUDENT".
69          data want;     set have;  by ID;   if Start le &Dates le End then     Count+1;  output;  run;
                                                                     ___
                                                                     388
                                                                     76

ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0 observations and 3 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      

NOTE: Line generated by the invoked macro "STUDENT".
PaigeMiller
Diamond | Level 26

@twildone wrote:

Hi.....I have included the log below with the error message. Thanks

 

   
50         %macro Student();
51         Proc sql;
52             Select distinct Date into :Dates separated by ' ' from have2;
2                                                          The SAS System                                 11:33 Sunday, June 6, 2021

53         Quit;
54         
55         %do i=1 %to %sysfunc(countw(&Dates));
56             %let Dates=%scan(&Dates,&i,%str( ));
57         data want;
58             set have;
59         	by ID;
60         		if Start le &Dates le End then
61         			Count+1;
62         	output;
63         	run;
64         
65         %end;
66         %mend;
67         %Student
68         
69         GOPTIONS NOACCESSIBLE;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds
      


NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
      

NOTE: Line generated by the invoked macro "STUDENT".
69          data want;     set have;  by ID;   if Start le &Dates le End then     Count+1;  output;  run;
                                                                     ___
                                                                     388
                                                                     76

ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0 observations and 3 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      

NOTE: Line generated by the invoked macro "STUDENT".

When you are trying to debug a macro, you need to place the command

 

options mprint;

before the running of the macro, and then run it again. This writes a lot of useful information to the LOG, and shows you the actual SAS code the macro is generating. This will enable you to see if the macro is doing the right thing, or not. So I suggest you try it.

 

 

When I try this on your code, I see

 

MPRINT(STUDENT):   if Start le 2019-05-22 le End then Count+1;

which doesn't do what you want. This line looks to see if the value 2019 minus 05 minus 22 (that's what it says) is between the values of START and END. Can you figure out how to change this?

 

Then, later, more errors coming from this line:

 MPRINT(STUDENT):   if Start le le End then Count+1;

Can you figure out how to change this to make it work?

 

As I said earlier, why introduce the complications of macros when working code has already been provided? (my code works on your data sets HAVE and HAVE2)

--
Paige Miller
twildone
Pyrite | Level 9

Hi Kurt....I did try your suggestion and from the log below it looks like I ran out of disk space. Thanks.

 

NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
NOTE: The query requires remerging summary statistics back with the original data.
ERROR: Insufficient space in file WORK.'SASTMP-000007194'n.UTILITY.
ERROR: File WORK.'SASTMP-000007194'n.UTILITY is damaged. I/O processing did not complete.
NOTE: Error was encountered during utility-file processing. You may be able to execute the SQL statement successfully if you 
      allocate more space to the WORK library.
ERROR: There is not enough WORK disk space to store the results of an internal sorting phase.
ERROR: An error has occurred.

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
18848      quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           10:31.69
      cpu time            6:35.82
      
18849  
jimbarbour
Meteorite | Level 14
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.

I see in the first NOTE that a Cartesian product is occurring with the way your SQL is coded.  A Cartesian product basically involves joining every possible combination of rows between two tables -- which if the tables are large will require inordinate amounts of disk space.  Really, I think this is an issue more with how the query is coded than it is about not enough WORK space.  

 

Jim

Kurt_Bremser
Super User

@jimbarbour the cartesian join was the result of me trying a brute force, simple approach first, and to show that no macro coding is needed at all.

With the data prohibiting that, I then switched to a data step that will handle anything that can even be stored in the target environment.

Kurt_Bremser
Super User

So, because of the size of the cartesian join, we need to forego the obvious SQL solution and use some data step trickery to do it in a sequential pass, by using an array indexed by dates:

data have;
input ID $ (Start End) (:yymmdd10.);
format Start End yymmdd10.;
cards;
1 20190519 20190601
2 20190306 20190324
3 20190822 20190830
;

data have2;
input Date :yymmdd10.;
format Date yymmdd10.;
cards;
20190522
20190528
20190615
20190820
20190823
20190827
;

%let array_start = %sysfunc(inputn(19000101,yymmdd8.));
%let array_end = %sysfunc(inputn(21001231,yymmdd8.));

data want;
set have;
if _n_ = 1
then do;
  array dates {&array_start.:&array_end.} _temporary_;
  do while (not done);
    set have2 end=done;
    dates{date} = 1;
  end;
end;
count = 0;
do _n_ = start to end;
  count = sum(count,dates{_n_});
end;
drop date;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 20 replies
  • 2879 views
  • 1 like
  • 6 in conversation