DATA Step, Macro, Functions and more

Index key lookup

Reply
Super Contributor
Posts: 387

Index key lookup

Hi,

Please see SAS(R) 9.3 Statements: Reference, examples #7 and #8.

Sample Code:

data dups (index=(x / nomiss));

  do x=1 to 10;

    y=x*2;

    z=x**2;

    output;

    output;

    output;

  end;

run;

data nodups (index=(x));

  do x=2 to 10 by 2;

    output;

  end;

run;

data final1;

  set nodups;

  set dups key=x;

run;

data final2;

  set nodups;

  set dups key=x / unique;

run;

data final3;

  set dups;

  set nodups key=x;

foo=0+y;

run;

data final4;

  set dups;

  set nodups key=x / unique;

foo=0+y;

run;

I quote Example 8:

This example uses the KEY= option to perform a table lookup. The DATA step reads a primary data set that is named INVTORY, which is indexed on PARTNO, and a lookup data set named PARTCODE. PARTCODE contains quantities of new stock (variable NEW_STK). The UNIQUE option ensures that, if there are any duplicate observations in INVTORY, values of NEW_STK are added only to the first observation of the group.


I'm probably just misunderstanding the documentation, but shouldn't I expect different results for final3 and final4?  If these results are as you expect, can you further explain the underlined bit above?


Thanks,

Scott

Respected Advisor
Posts: 3,799

Re: Index key lookup

Posted in reply to ScottBass

If you add a variable to NODUPS you will see a difference.

Super Contributor
Posts: 387

Re: Index key lookup

Posted in reply to data_null__

Thanks Data.

I changed my test code as below and now see the difference.  From reading the example in the doc, I would have expected to see the difference in Final3 and Final4, not Final1 and Final2.  Am I just mis-reading the doc, or is the example wrong?

I was pleased to see that repeated calls to set dups key=x retrieved the next record matching the key, rather than continually retrieving the first record.  This, coupled with _iorc_, would allow building a Cartesian product if you know or expect the lookup dataset to have non-unique keys.

With my "real" data, I've got a small transaction dataset, but millions of records in SCD2 dimension tables.  I also need this lookup to be fast, as it's for an online (stored process) application.  This approach was blindingly quick - much faster than hash object join or SQL (even with indexes).  Like 0.5 seconds vs. 45 seconds using hash objects.

I've been programming SAS a while, and this technique has been in the doc even longer - too bad I never connected the dots until now Smiley Happy.  As you say in your profile, I'm (still) learning SAS.

I think this approach will be very useful for our ETL/Data Warehouse processing.

Thanks again Data.

data dups (index=(x / nomiss));

  do x=1 to 10;

    y=x*2;

    z+1;output;

    z+1;output;

    z+1;output;

  end;

run;

data nodups (index=(x));

  do x=2 to 10 by 2;

    a=x**4;

    output;

  end;

run;

data final1;

  set nodups;

  do i=1 to 4;

  set dups key=x;

rc=_iorc_;

  output;

  end;

run;

data final2;

  set nodups;

  do i=1 to 4;

  set dups key=x / unique;

rc=_iorc_;

  output;

  end;

run;

data final3;

  set dups;

  set nodups key=x;

  foo=0+y;

  bar=0+a;

run;

data final4;

  set dups;

  set nodups key=x / unique;

  foo=0+y;

  bar=0+a;

run;

Ask a Question
Discussion stats
  • 2 replies
  • 227 views
  • 0 likes
  • 2 in conversation