DATA Step, Macro, Functions and more

Sql type counts with joins using Data step.

Reply
Regular Contributor
Regular Contributor
Posts: 166

Sql type counts with joins using Data step.

Guys,
I have 2 data sets. one contains ID , Indate , outdate
The other contains Leave_dates

I wish to find for each id in DATA1 the count of leave dates in data2 which are greater than or equal to indate and less than or equal to outdate.I could do this in PROC SQL bu the DATA size is very large and it takes hours to process hence i was hoping if this could be done in the data step to help reduce the processing time.
Please help

data1
ID INDATE OUTDATE
1 01oct2009 06oct2009
2 01oct2009 03oct2009

DATA2
LEAVE_DATES
01oct2009
02oct2009
04oct2009
05oct2009

Message was edited by: NN Message was edited by: NN
N/A
Posts: 0

Re: Sql type counts with joins using Data step.

proc sort data=data2; by leavedate;

data counted;
set data1;

count = 0;
do i=1 to N;
set data2 nobs=N;
if indate <= leavedate <= outdate then count+1;
end;
output;
run;

Or something like that.
Regular Contributor
Regular Contributor
Posts: 166

Re: Sql type counts with joins using Data step.

Posted in reply to deleted_user
Thanks for the reply chuck
but is there something missing in the Above code bcause
The Output that i was looking for was

ID COUNT_OF_LEAVEDATES
1 4
2 2
Super Contributor
Posts: 474

Re: Sql type counts with joins using Data step.

Hello NN.

Being DATA1 the large table and DATA2 a very small one, you could force an index iteration between the the two and perform the count, like this:

* create "phoney" index on DATA2;
data data3 (index=(IDX));
set data2;
IDX=0;
run;

* do counts;
data counts;
set data1;
keep ID COUNT;
IDX=0; COUNT=0; * init index and counter;
do until (IDX eq 2); * do until no more matches;
set data3 key=IDX; * match;
if not _ERROR_ and LEAVE_DATES le OUTDATE then do;
if INDATE le LEAVE_DATES then COUNT+1; * indate/outdate match;
end;
else IDX+1; * reset phoney index;
end;
output; * output total;
_ERROR_=0; * reset _ERROR_ flag;
run;

This is a technique that I've been using for some years (before hashing appeared into SAS) with great success for a big table X very small table match scenario with N to N matching relationship. Do not try to do this with a DATA2 with more then 1000 rows, performance will be disastrous.

Another way of doing the same, would be to load the smaller table (DATA2) into a hash iterator object and iterate through it for each observation of the large dataset (DATA1).
Here you could have a DATA2 table up to 1.5M without compromising performance.

Check the online doc for the hash iterator object here:
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a002576871.htm

Cheers from Portugal.

Daniel Santos @ www.cgd.pt
N/A
Posts: 0

Re: Sql type counts with joins using Data step.

I didn't wrap the code in '\[pre\] ... \[/pre\]' so it got cut off in the posting.

[pre]
count = 0;
do i=1 to N;
set data2 nobs=N;
if indate LE leave_dates LE outdate then count+1;
end;
[/pre]

I was using angle brackets and equal signs, and they weren't making it through the html stuff, so I replaced them with 'LE'.

But this isn't perfect because you won't necessarily run all the way through data2 each time. But, this should point you in the right direction. You may need to use "FETCH" or observation pointers with data2 to insure stepping through it each time. REad through the documentation for base SAS statements. If you have PC SAS, then you can get there through "help" otherwise, you can get the documentation via the SAS support site. That's how I learn what I can do with SAS, and I've been programming in SAS since 1996, and am still picking up new stuff fairly regularly.

Message was edited by: Chuck Message was edited by: Chuck
Ask a Question
Discussion stats
  • 4 replies
  • 140 views
  • 0 likes
  • 3 in conversation