BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
AZFXL
Calcite | Level 5

Hi all, I would like to lookup value from table1 into table2 as shown below based on multiple condition (i.e. to lookup the price given that the length and height is within the respective range). I have tried using proc sql with update statement to do so but the runtime is too long that it is not practical to use this approach. Therefore, would like to seek for some advice from the community if there is any other better approach.

 

Table 1

PRODUCTPRICEMIN_LENGTHMAX_LENGTHMIN_HEIGHTMAX_HEIGHT
PROD A10025025
PROD A2026402640
PROD A3041504150
PROD B15025025
PROD B2526402640
PROD B3541504150

 

Table 2

PRODUCTLENGTHHEIGHT
PROD A1020
PROD B3040

 

Expected output

PRODUCTLENGTHHEIGHTPRICE
PROD A102010
PROD B304025

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

What about a basic proc sql without update?

 

proc sql;
create table want as 
select t1.*, lu.price
from table2 as t1
left join table1 as lu
on t1.length between lu.min_length and lu.max_length
and t2.height between lu.min_height and lu.max_height
order by 1, 2, 3;
quit;

What are the sizes of your tables (# of rows)?

A hash or temporary arrays may be another option or loading a table to memory via sasfile would probably save time. 

View solution in original post

7 REPLIES 7
Reeza
Super User

What about a basic proc sql without update?

 

proc sql;
create table want as 
select t1.*, lu.price
from table2 as t1
left join table1 as lu
on t1.length between lu.min_length and lu.max_length
and t2.height between lu.min_height and lu.max_height
order by 1, 2, 3;
quit;

What are the sizes of your tables (# of rows)?

A hash or temporary arrays may be another option or loading a table to memory via sasfile would probably save time. 

AZFXL
Calcite | Level 5

The lookup table (table1) has around 5000 rows but the master table (table2) has around 2million rows. I would prefer to use hash but I do not know how to add the 'within range' condition for length and height in the hash statement.

Reeza
Super User

@AZFXL wrote:

The lookup table (table1) has around 5000 rows but the master table (table2) has around 2million rows. I would prefer to use hash but I do not know how to add the 'within range' condition for length and height in the hash statement.


With those sizes, a SQL join would be fine for sure. I think sasfile works best when the dataset is processed multiple times so not sure this is more efficient or not. 

sasfile work.table1 open;

proc sql;
create table want as 
select t1.*, lu.price
from table2 as t1
left join table1 as lu
on t1.product = lu.product and
t1.length between lu.min_length and lu.max_length
and t2.height between lu.min_height and lu.max_height
order by 1, 2, 3;
quit;

sasfile work.table1 close;
mkeintz
PROC Star

There is no "contains" or "between" method in the sas hash object.  But if all your lengths and heights are integers, then you could build a lookup table having every length/height combination in the rectangle defined by min_length/max_length and min_height/max_height for each row in table 1.

 

Code below is untested in the absence of sample data in the form of working data steps:

 

 

data want (keep=product length height price);
  set table1 (in=int1)
      table2 (in=int2);

  if _n_=1 then do;
    declare hash exact_lh (hashexp:10);
      exact_lh.definekey('product','length','height');
      exact_lh.definedata('price');
      exact_lh.definedone();
  end;

  if int1 then do length=min_length to max_length by 1;
    do height=min_height to max_height by 1;
      exact_lh.add();
    end;
  end;

  if int2;
  if exact_lh.find()^=0 then price=.;
run;

 

 

Notes:

Even if your lengths and heights are not exact integers but (for example) exact halves, quarters, eights, etc, you can use this technique.  Just change the "by 1" to "by 0.5", "by 0.25", "by 0.125" etc.

 

Also, I used the "hashexp:10" option (instead of default hashexp:8) to tell hash to establish 2**10=1024 buckets.  If each of your table1 entries had a length range of 15 and height range of 20, that would be 300 hash dataitems per table1 row.  For 5,000 table1 rows, that's about 1,500,000 data items - i.e. about 1,500 data items per bucket.  Probably that will perform better than the 256 buckets you would otherwise have from the default hashexp.

 

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
FreelanceReinh
Jade | Level 19

Hi @AZFXL,

 

I would consider creating user-defined formats or informats from table 1 as these are suitable for categorizing numeric values. Your example (if it's not oversimplified) suggests that several products and even length and height might share the same (in)format to turn measurements into category numbers -- where the product-dependent (in)format can be applied dynamically (see PUTN and related functions). The assignment of prices to products and discrete category numbers (rather than continuous measurements) should then perform much better.

Reeza
Super User

Not sure formats would work well here where you have two different variables to consider at the same time, height/length. If it was a single variable I think it would work well for each product. 

FreelanceReinh
Jade | Level 19

@Reeza wrote:

Not sure formats would work well here where you have two different variables to consider at the same time, height/length.


Not for a direct mapping (length, height) → price, that's right, but for a preliminary categorization of the two continuous variables (I thought). The relatively few combinations of discrete length and height categories would be amenable to a wider range of methods for obtaining the price, e.g., a hash object.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 4312 views
  • 1 like
  • 4 in conversation