Hi...I have 2 datasets. The first has a Start and End Date for each ID. The second dataset has a list of dates. What I an trying to do is count the number of Dates listed in dataset have2 for each ID in dataset have that is between the Start and End dates.Any suggestions. Thanks
data have; input ID $ (Start End) (:yymmdd10.); format Start End yymmdd10.; cards; 1 20190519 20190601 2 20190306 20190324 3 20190822 20190830 ; data have2; input Date :yymmdd10.; format Date yymmdd10.; cards; 20190522 20190528 20190615 20190820 20190823 20190827 ; data want; input ID $ (Start End) (:yymmdd10.) NumberDays; format Start End yymmdd10.; cards; 1 20190519 20190601 2 2 20190306 20190324 0 3 20190822 20190830 2 ;
So, because of the size of the cartesian join, we need to forego the obvious SQL solution and use some data step trickery to do it in a sequential pass, by using an array indexed by dates:
data have;
input ID $ (Start End) (:yymmdd10.);
format Start End yymmdd10.;
cards;
1 20190519 20190601
2 20190306 20190324
3 20190822 20190830
;
data have2;
input Date :yymmdd10.;
format Date yymmdd10.;
cards;
20190522
20190528
20190615
20190820
20190823
20190827
;
%let array_start = %sysfunc(inputn(19000101,yymmdd8.));
%let array_end = %sysfunc(inputn(21001231,yymmdd8.));
data want;
set have;
if _n_ = 1
then do;
array dates {&array_start.:&array_end.} _temporary_;
do while (not done);
set have2 end=done;
dates{date} = 1;
end;
end;
count = 0;
do _n_ = start to end;
count = sum(count,dates{_n_});
end;
drop date;
run;
proc transpose data=have2 out=have2_t;
run;
data want;
if _n_=1 then set have2_t;
set have;
array c col:;
count=0;
do i=1 to dim(c);
if start<c(i)<end then count=count+1;
end;
keep start end id count;
run;
@twildone wrote:
Hi...I have 2 datasets. The first has a Start and End Date for each ID. The second dataset has a list of dates. What I an trying to do is count the number of Dates listed in dataset have2 for each ID in dataset have that is between the Start and End dates.Any suggestions. Thanks
data have; input ID $ (Start End) (:yymmdd10.); format Start End yymmdd10.; cards; 1 20190519 20190601 2 20190306 20190324 3 20190822 20190830 ; data have2; input Date :yymmdd10.; format Date yymmdd10.; cards; 20190522 20190528 20190615 20190820 20190823 20190827 ; data want; input ID $ (Start End) (:yymmdd10.) NumberDays; format Start End yymmdd10.; cards; 1 20190519 20190601 2 2 20190306 20190324 0 3 20190822 20190830 2 ;
Since you are reading in these dates as SAS dates, they will be a number (the count of days since January 1, 1960), and you can just use a regular mathematical expression. I think you'll want two SET statements, one inside the other.
The mathematical expression would be something fairly simple like:
IF Start <= Date <= End THEN
Count + 1;
Why don't you experiment with that code and two SET statements and see if you can get it working?
Jim
@twildone Here is an alternate solution with the help PROC SQL and Data step.
proc sql noprint;
create table have_3 as
select a.*,b.date from have as a, have2 as b where date between start and end order by ID;
quit;
data have_4;
set have_3;
by Id;
if first.id then NumberDays=1;
else NumberDays+1;
if last.id;
drop date;
run;
proc sql;
create table want as
select id,start,end,NumberDays from have_4
union all
select id,start,end ,0 as NumberDays from Have where id not in (select distinct(Id) from have_3)
order by id;
quit;
Thanks:
proc sql;
create table want as
select
a.id,
a.start,
a.end,
count(b.date) as count
from have a left join have2 b
on a.start le b.date le a.end
group by a.id, a.start, a.end;
;
quit;
Untested, posted from my tablet.
Hi....I was trying to use a macro to loop through the list but I get an error message. Thanks.
%macro Student(); Proc sql; Select distinct Date into :Dates separated by ' ' from have2; Quit; %do i=1 %to %sysfunc(countw(&Dates)); %let Dates=%scan(&Dates,&i,%str( )); data want; set have; by ID; if Start <= &Dates <= End then Count + 1; output; run; %end; %mend; %Student
@twildone wrote:
Hi....I was trying to use a macro to loop through the list but I get an error message.
What error? Show us the LOG.
Why add in macro complications, when working solutions have been posted above?
Hi.....I have included the log below with the error message. Thanks
27 28 GOPTIONS ACCESSIBLE; 29 data have; 30 input ID $ (Start End) (:yymmdd10.); 31 format Start End yymmdd10.; 32 cards; NOTE: The data set WORK.HAVE has 3 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 36 ; 37 38 data have2; 39 input Date :yymmdd10.; 40 format Date yymmdd10.; 41 cards; NOTE: The data set WORK.HAVE2 has 6 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds 48 ; 49 50 %macro Student(); 51 Proc sql; 52 Select distinct Date into :Dates separated by ' ' from have2; 2 The SAS System 11:33 Sunday, June 6, 2021 53 Quit; 54 55 %do i=1 %to %sysfunc(countw(&Dates)); 56 %let Dates=%scan(&Dates,&i,%str( )); 57 data want; 58 set have; 59 by ID; 60 if Start le &Dates le End then 61 Count+1; 62 output; 63 run; 64 65 %end; 66 %mend; 67 %Student 68 69 GOPTIONS NOACCESSIBLE; NOTE: PROCEDURE SQL used (Total process time): real time 0.03 seconds cpu time 0.03 seconds NOTE: There were 3 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 3 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds NOTE: Line generated by the invoked macro "STUDENT". 69 data want; set have; by ID; if Start le &Dates le End then Count+1; output; run; ___ 388 76 ERROR 388-185: Expecting an arithmetic operator. ERROR 76-322: Syntax error, statement will be ignored. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 3 variables. WARNING: Data set WORK.WANT was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds NOTE: Line generated by the invoked macro "STUDENT".
@twildone wrote:
Hi.....I have included the log below with the error message. Thanks
50 %macro Student(); 51 Proc sql; 52 Select distinct Date into :Dates separated by ' ' from have2; 2 The SAS System 11:33 Sunday, June 6, 2021 53 Quit; 54 55 %do i=1 %to %sysfunc(countw(&Dates)); 56 %let Dates=%scan(&Dates,&i,%str( )); 57 data want; 58 set have; 59 by ID; 60 if Start le &Dates le End then 61 Count+1; 62 output; 63 run; 64 65 %end; 66 %mend; 67 %Student 68 69 GOPTIONS NOACCESSIBLE; NOTE: PROCEDURE SQL used (Total process time): real time 0.03 seconds cpu time 0.03 seconds NOTE: There were 3 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 3 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds NOTE: Line generated by the invoked macro "STUDENT". 69 data want; set have; by ID; if Start le &Dates le End then Count+1; output; run; ___ 388 76 ERROR 388-185: Expecting an arithmetic operator. ERROR 76-322: Syntax error, statement will be ignored. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 3 variables. WARNING: Data set WORK.WANT was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds NOTE: Line generated by the invoked macro "STUDENT".
When you are trying to debug a macro, you need to place the command
options mprint;
before the running of the macro, and then run it again. This writes a lot of useful information to the LOG, and shows you the actual SAS code the macro is generating. This will enable you to see if the macro is doing the right thing, or not. So I suggest you try it.
When I try this on your code, I see
MPRINT(STUDENT): if Start le 2019-05-22 le End then Count+1;
which doesn't do what you want. This line looks to see if the value 2019 minus 05 minus 22 (that's what it says) is between the values of START and END. Can you figure out how to change this?
Then, later, more errors coming from this line:
MPRINT(STUDENT): if Start le le End then Count+1;
Can you figure out how to change this to make it work?
As I said earlier, why introduce the complications of macros when working code has already been provided? (my code works on your data sets HAVE and HAVE2)
That macro has got absolutely nothing to do with my code. Try that as posted and look if it does the job.
Hi Kurt....I did try your suggestion and from the log below it looks like I ran out of disk space. Thanks.
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: The query requires remerging summary statistics back with the original data. ERROR: Insufficient space in file WORK.'SASTMP-000007194'n.UTILITY. ERROR: File WORK.'SASTMP-000007194'n.UTILITY is damaged. I/O processing did not complete. NOTE: Error was encountered during utility-file processing. You may be able to execute the SQL statement successfully if you allocate more space to the WORK library. ERROR: There is not enough WORK disk space to store the results of an internal sorting phase. ERROR: An error has occurred. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 18848 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 10:31.69 cpu time 6:35.82 18849
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
I see in the first NOTE that a Cartesian product is occurring with the way your SQL is coded. A Cartesian product basically involves joining every possible combination of rows between two tables -- which if the tables are large will require inordinate amounts of disk space. Really, I think this is an issue more with how the query is coded than it is about not enough WORK space.
Jim
@jimbarbour the cartesian join was the result of me trying a brute force, simple approach first, and to show that no macro coding is needed at all.
With the data prohibiting that, I then switched to a data step that will handle anything that can even be stored in the target environment.
So, because of the size of the cartesian join, we need to forego the obvious SQL solution and use some data step trickery to do it in a sequential pass, by using an array indexed by dates:
data have;
input ID $ (Start End) (:yymmdd10.);
format Start End yymmdd10.;
cards;
1 20190519 20190601
2 20190306 20190324
3 20190822 20190830
;
data have2;
input Date :yymmdd10.;
format Date yymmdd10.;
cards;
20190522
20190528
20190615
20190820
20190823
20190827
;
%let array_start = %sysfunc(inputn(19000101,yymmdd8.));
%let array_end = %sysfunc(inputn(21001231,yymmdd8.));
data want;
set have;
if _n_ = 1
then do;
array dates {&array_start.:&array_end.} _temporary_;
do while (not done);
set have2 end=done;
dates{date} = 1;
end;
end;
count = 0;
do _n_ = start to end;
count = sum(count,dates{_n_});
end;
drop date;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.