Hi,
Please see SAS(R) 9.3 Statements: Reference, examples #7 and #8.
Sample Code:
data dups (index=(x / nomiss));
do x=1 to 10;
y=x*2;
z=x**2;
output;
output;
output;
end;
run;
data nodups (index=(x));
do x=2 to 10 by 2;
output;
end;
run;
data final1;
set nodups;
set dups key=x;
run;
data final2;
set nodups;
set dups key=x / unique;
run;
data final3;
set dups;
set nodups key=x;
foo=0+y;
run;
data final4;
set dups;
set nodups key=x / unique;
foo=0+y;
run;
I quote Example 8:
This example uses the KEY= option to perform a table lookup. The DATA step reads a primary data set that is named INVTORY, which is indexed on PARTNO, and a lookup data set named PARTCODE. PARTCODE contains quantities of new stock (variable NEW_STK). The UNIQUE option ensures that, if there are any duplicate observations in INVTORY, values of NEW_STK are added only to the first observation of the group.
I'm probably just misunderstanding the documentation, but shouldn't I expect different results for final3 and final4? If these results are as you expect, can you further explain the underlined bit above?
Thanks,
Scott
If you add a variable to NODUPS you will see a difference.
Thanks Data.
I changed my test code as below and now see the difference. From reading the example in the doc, I would have expected to see the difference in Final3 and Final4, not Final1 and Final2. Am I just mis-reading the doc, or is the example wrong?
I was pleased to see that repeated calls to set dups key=x retrieved the next record matching the key, rather than continually retrieving the first record. This, coupled with _iorc_, would allow building a Cartesian product if you know or expect the lookup dataset to have non-unique keys.
With my "real" data, I've got a small transaction dataset, but millions of records in SCD2 dimension tables. I also need this lookup to be fast, as it's for an online (stored process) application. This approach was blindingly quick - much faster than hash object join or SQL (even with indexes). Like 0.5 seconds vs. 45 seconds using hash objects.
I've been programming SAS a while, and this technique has been in the doc even longer - too bad I never connected the dots until now . As you say in your profile, I'm (still) learning SAS.
I think this approach will be very useful for our ETL/Data Warehouse processing.
Thanks again Data.
data dups (index=(x / nomiss));
do x=1 to 10;
y=x*2;
z+1;output;
z+1;output;
z+1;output;
end;
run;
data nodups (index=(x));
do x=2 to 10 by 2;
a=x**4;
output;
end;
run;
data final1;
set nodups;
do i=1 to 4;
set dups key=x;
rc=_iorc_;
output;
end;
run;
data final2;
set nodups;
do i=1 to 4;
set dups key=x / unique;
rc=_iorc_;
output;
end;
run;
data final3;
set dups;
set nodups key=x;
foo=0+y;
bar=0+a;
run;
data final4;
set dups;
set nodups key=x / unique;
foo=0+y;
bar=0+a;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.