Hi,
Please see SAS(R) 9.3 Statements: Reference, examples #7 and #8.
Sample Code:
data dups (index=(x / nomiss));
do x=1 to 10;
y=x*2;
z=x**2;
output;
output;
output;
end;
run;
data nodups (index=(x));
do x=2 to 10 by 2;
output;
end;
run;
data final1;
set nodups;
set dups key=x;
run;
data final2;
set nodups;
set dups key=x / unique;
run;
data final3;
set dups;
set nodups key=x;
foo=0+y;
run;
data final4;
set dups;
set nodups key=x / unique;
foo=0+y;
run;
I quote Example 8:
This example uses the KEY= option to perform a table lookup. The DATA step reads a primary data set that is named INVTORY, which is indexed on PARTNO, and a lookup data set named PARTCODE. PARTCODE contains quantities of new stock (variable NEW_STK). The UNIQUE option ensures that, if there are any duplicate observations in INVTORY, values of NEW_STK are added only to the first observation of the group.
I'm probably just misunderstanding the documentation, but shouldn't I expect different results for final3 and final4? If these results are as you expect, can you further explain the underlined bit above?
Thanks,
Scott
If you add a variable to NODUPS you will see a difference.
Thanks Data.
I changed my test code as below and now see the difference. From reading the example in the doc, I would have expected to see the difference in Final3 and Final4, not Final1 and Final2. Am I just mis-reading the doc, or is the example wrong?
I was pleased to see that repeated calls to set dups key=x retrieved the next record matching the key, rather than continually retrieving the first record. This, coupled with _iorc_, would allow building a Cartesian product if you know or expect the lookup dataset to have non-unique keys.
With my "real" data, I've got a small transaction dataset, but millions of records in SCD2 dimension tables. I also need this lookup to be fast, as it's for an online (stored process) application. This approach was blindingly quick - much faster than hash object join or SQL (even with indexes). Like 0.5 seconds vs. 45 seconds using hash objects.
I've been programming SAS a while, and this technique has been in the doc even longer - too bad I never connected the dots until now . As you say in your profile, I'm (still) learning SAS.
I think this approach will be very useful for our ETL/Data Warehouse processing.
Thanks again Data.
data dups (index=(x / nomiss));
do x=1 to 10;
y=x*2;
z+1;output;
z+1;output;
z+1;output;
end;
run;
data nodups (index=(x));
do x=2 to 10 by 2;
a=x**4;
output;
end;
run;
data final1;
set nodups;
do i=1 to 4;
set dups key=x;
rc=_iorc_;
output;
end;
run;
data final2;
set nodups;
do i=1 to 4;
set dups key=x / unique;
rc=_iorc_;
output;
end;
run;
data final3;
set dups;
set nodups key=x;
foo=0+y;
bar=0+a;
run;
data final4;
set dups;
set nodups key=x / unique;
foo=0+y;
bar=0+a;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.