09-20-2013 12:43 PM
Can someone explain to me the meaning of these few lines of code??
dataset x has 3 records per ID and dataset yy has more than 3 records per ID
Whats the purpsose of 2 set statements??
what is point=i and noobs=noobs doing with the YY dataset ???
data w(drop=_: dept);
do i=1 to nobs;
set yy(rename=(id=_id in=_in out=_out)) nobs=nobs point=i;
if id=_id and (in ge _in and in le _out) then dept_in=dept;
if id=_id and (out ge _in and out le _out) then dept_out=dept;
09-20-2013 02:59 PM
From a non-technical standpoint, the data step reads a record from x, then reads all records in the yy, conditionally assigning values to dept_in and dept_out, and then outputs a record. It then reads the next record in x, and so on until all of x is read and output.
The nobs=nobs set option sets up a vaiable called nobs which is known to SAS but not in the output dataset. This variable is assigned a value before any daata is read.This information is stored in the header information for a sas data set. So at compile step, SAS reads that value from data set header and then can use for controlling looping. The do i = 1 to nobs; causes the variables i and nobs to be declared. The nobs= in set statement causes actual number of observations to be saved as the variable nobs.
Point=i allows direct access to specific records in a data set and only reads that record.
There may be more efficient ways of obtaining the same results and I would consider revising if either of the data sets is large.
09-20-2013 03:11 PM
thanks for the response..
what are the other (datasteps )ways of doing it.?
other than the hashing and the SQL??
And also where in the documentation we find information about using 2 set statements and the purposes explained with point and obs= options???
09-20-2013 03:43 PM
In terms of documentation, I keep pdf copies of several SAS manuals on my PC. So that is the first place i go. In this case, SAS 9.2 Language Reference: Dictionary, 4th Ediition, pp 1764 to 1774. Online you can get to this from http://support.sas.com/documentation/ The default format there is HTML but you can switch to PDF if you want and then download. Or you can use this search string in gooble or bing, "set statement option nobs site:support.sas.com"
I initially thought of sql or hashing but some other things might also work;
a. Using set option key=. This also allows direct access but does by value rather than record number. You would not have to read every record in the second data set for each iteration of the datastep
b data step merge with by statement on id in and out. This miight reduce the comparisons being made greatly. I can't say for sure but the original code seems to be getting from latest in value and first out value. This would require both data sets to be sorted or indexed appropriately. But then I think one pass through both data sets is all that is needed.
c. Make a single pass through second dataset to construct unique records for each ID with appropriate in, out, in_dept and out_dept. Seems that can be done independently of the information in first data set. Then merge constructed dataset with first dataset.
09-20-2013 03:56 PM
SAS is built handle rows one at a time through the data vector. Thus, if you need to do vertical processing or many-to-many merging you have to go painfully around to achieve what hash or sql has to offer.
The specific example you've used above which is inneficient because it runs through the entire YY dataset even those records where ID is different can be solved, besides hash and sql, via array _temporary_ and direct addressing.
It still requires 2 set statements but one is enclosed in a do block to read the entire lookup table in an array before any real processing occurs making it more clear than using pointer on direct addresses. The issue is that if your lookup table is big, since array (as do hash tables) are all stored in memory rather than on disk, you will likely run out of memory to complete your task. It also requires some knowledge of your search key to implement.
To do many-to-many merges, you pretty much have to use an approach similar to the above with very brutal control over both of your datasets pointers or you have to do some transposition on either dataset to convert it into a one-to-many and then transpose it back into a many-to-many. Either or will feel just as tedious as the above code.
09-20-2013 04:24 PM
Were there no comments in the program to tell you what it is trying to do?
It seems to want for each record in X find the DEPT value for the last record in YY where:
1) the value of IN in X is between IN and OUT in YY.
2) the value of OUT in X is between IN and OUT in YY.
Looks like perhaps X is people with entry and exit dates and YY is department assignments for people with start and stop dates. So the program is trying to find the departement they started in and the one they exited from.
So perhaps X is hospital stays and YY are hospital rooms or wards?
09-20-2013 05:05 PM
At the time i posted this question i am not very confident about the question i posted.
Do u visit the SAS community on weekend???
DO u have an email contact where i could post you a question in person?