SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Multiple Columns to Lookup

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 89
Accepted Solution

Multiple Columns to Lookup

Good Morning, All.

I've got a table with five ID columns that I need to lookup their value of on another table.

Say the columns are:

ID_1

ID_2

ID_3

ID_4

ID_5

I was thinking that I could use a loop transformation. First, set a control table with records 1-5, then iterate through a lookup job five times. The thing is, in that lookup job, how would I go about setting it up to find the value of ID_1 on the first iteration, then the value of ID_2 on the second, etc? As the lookup transformation requires mappings of columns.

Is there a way to do this? Am I on the right path or is there a better solution?

I'd rather not repeat the same lookup step five times, which is why I was hoping to just loop.

Thanks!

-Jeff


Accepted Solutions
Solution
‎01-17-2015 03:03 PM
Super User
Posts: 5,430

Re: Multiple Columns to Lookup

Without seeing what you have done, I think have all look-up in the same transformation is the most efficient. Each look-up tansformation will add another table scan of your source data. If your memory is being exhausted by a single look-up, is the only reason that I could see for going with another solution.

But, to see the hard facts for your situation, try them out and compare. Just try to avoid user written code if possible.

Data never sleeps

View solution in original post


All Replies
Super Contributor
Posts: 578

Re: Multiple Columns to Lookup

without knowing more about the table layouts...  You could also create a view.  Depending on the size of the lookup table and if it was a sas dataset or an external dbms table, a hash lookup might be faster but comes with more complexity.

proc sql;

create table want as

select

b.id_1,

t1.lookup as id_1_lookup,

b.id_2,

t2.lookup as id_2_lookup,

b.id_3,

t3.lookup as id_3_lookup

b.id_4,

t4.lookup as id_4_lookup,

b.id_5,

t5.lookup as id_5_lookup

from

     base b

     left outer join lookup_table t1

          on b.id_1=t1.id

     left outer join lookup_table t2

          on b.id_2=t2.id

     left outer join lookup_table t3

          on b.id_3=t3.id

     left outer join lookup_table t4

          on b.id_4=t4.id

     left outer join lookup_table t5

          on b.id_5=t5.id

;

quit;

Super User
Posts: 5,430

Re: Multiple Columns to Lookup

DBailey, this is a DI Studio problem, not a SAS programming one.

JWhite, perhaps I don't see the whole picture, but building an iterations sounds unnecessary complicated, and ineffective. Not even sure if it could work without user written code.

You could do five look-up in the same transformation, just attach the look-up table five times.

The only (?) thing that could be a problem is that you will have the look-up tabled loaded in five copies in memory, so I hope it's not too large.

Data never sleeps
Frequent Contributor
Posts: 89

Re: Multiple Columns to Lookup

Linus,

I ended up using five lookup transformations over user-written code, as it appears to be easier to read for any follow-up developers.

However, I just tried your trick of using five tables to one lookup transformation. That'll work as well.

Any idea which method is better/more efficient? The both have redundancies.

Thanks!

-Jeff

Solution
‎01-17-2015 03:03 PM
Super User
Posts: 5,430

Re: Multiple Columns to Lookup

Without seeing what you have done, I think have all look-up in the same transformation is the most efficient. Each look-up tansformation will add another table scan of your source data. If your memory is being exhausted by a single look-up, is the only reason that I could see for going with another solution.

But, to see the hard facts for your situation, try them out and compare. Just try to avoid user written code if possible.

Data never sleeps
Respected Advisor
Posts: 4,173

Re: Multiple Columns to Lookup

If I understand your data correctly then your look-up table consists of simple key/value pairs. Your base table has 5 columns with keys for which you need to look-up the value always using the same look-up table with the same key/value pairs. If this is correct then doing this job with a single look-up transformations (and 5 look-ups) is more efficient than 5 passes through the base table with a single look-up in each pass (in the end you still need to read the look-up table 5 times).

From a perspective of performance: You could in a user written node (may be implemented as a user written transformation) create a SAS format out of your look-up table and then simply apply this format to your ID variables. This would only require one pass through the look-up table when creating the format.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 542 views
  • 0 likes
  • 4 in conversation