DATA Step, Macro, Functions and more

Lookup on multiple columns

Reply
N/A
Posts: 0

Lookup on multiple columns

Hi,

I found a paper about table look ups and tried to apply it. Here is the code.


DATA TEST;
LENGTH NAME $ 7;
INFILE DATALINES;
INPUT ID NAME $;
DATALINES;
11 WILLIE
1 MAUS
129 JER
113 JOE
2 POOCH
;
RUN;

DATA MATCH;
INFILE DATALINES;
INPUT CITY $ ID BLAHBLA $;
DATALINES;
MARIKINA 1 1111SJD9U
MAKATI 2 1119IU9
MANILA 3 1129I3KCQ
SMART 6 1113IIDJ8A
;
RUN;


PROC SQL NOPRINT;
SELECT DISTINCT ID INTO :ID_LIST SEPERATED BY ','
FROM TEST;
QUIT;

%LET ID_LIST=&ID_LIST;

DATA LOOKUP;
SET MATCH;
IF SUBSTR(BLAHBLA,1,1) = '1' THEN DO;
STRING1=SUBSTR(BLAHBLA,2,1);
IF STRING1 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE DO;
STRING2=SUBSTR(BLAHBLA,2,2);
IF STRING2 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE DO;
STRING3=SUBSTR(BLAHBLA,2,3);
IF STRING3 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE 'NON-MATCH';
END;
END;
ELSE 'NOT POSSIBLE';
END;
RUN;

Is there another way in order to simplify this? Also the lookup tables is updating which holds the problem of restriction since, macro variables can hold up to 32k characters only. The idea is to perform lookup on multiple calculated columns.

Thank You Very Much,
Wil
SAS Employee
Posts: 27

Re: Lookup on multiple columns

Posted in reply to deleted_user
Matching on a partial string seems like a potentially error-prone approach. Is there no way to delimit the ID in BLAHBLA ?
N/A
Posts: 0

Re: Lookup on multiple columns

Sorry there's no way to delimit the values for Blahbla. However, the values for blahblah and id are all numeric and also it holds that after the prefix the next 3 digits are looked up. The 1st digit first, the 1st and 2nd digit next, and lastly, the 1st to 3rd digits.


DATA TEST;
LENGTH NAME $ 7;
INFILE DATALINES;
INPUT ID NAME $;
DATALINES;
11 WIL
1 MAU
129 JEROME
113 JOEY
2 POOCHIE
;
RUN;

DATA MATCH;
INFILE DATALINES;
INPUT CITY $ BLAHBLA $;
DATALINES;
MARIKINA 111123456
MAKATI 111925788
MANILA 112912468
SMART 111312356
;
RUN;


PROC SQL NOPRINT;
SELECT DISTINCT ID INTO :ID_LIST SEPERATED BY ','
FROM TEST;
QUIT;

%LET ID_LIST = &ID_LIST

DATA LOOKUP;
SET MATCH;
IF SUBSTR(BLAHBLA,1,1) = '1' THEN DO;
STRING1=SUBSTR(BLAHBLA,2,1);
IF STRING1 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE DO;
STRING2=SUBSTR(BLAHBLA,2,2);
IF STRING2 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE DO;
STRING3=SUBSTR(BLAHBLA,2,3);
IF STRING3 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE 'NON-MATCH';
END;
END;
ELSE 'NOT POSSIBLE';
END;
RUN;
SAS Employee
Posts: 27

Re: Lookup on multiple columns

Posted in reply to deleted_user
The approach seems like it still may generate false positives.
For example, does 11 WIL match to MARIKINA, MAKATI, or SMART, or potentially none of these? Do you have any more facts about BLAHBLA that you could share?
Super Contributor
Super Contributor
Posts: 365

Re: Lookup on multiple columns

Posted in reply to deleted_user
Hello Axe,

It seems to me that using an array of macro variables could help to solve the problem of "32k" but the program becomes more complicated. I mean using
[pre]
proc SQL;
select count(Distinct ID) :into N from test;
%let N=%TRIM(&N);
select distinct ID into :ID1-:ID&N from TEST;
quit;
[/pre]
with subsequent cycle on ID1 to ID&N.

Sincerely,
SPR
N/A
Posts: 0

Re: Lookup on multiple columns

Hi SPR,

Can you provide me an example on how to use it? All I think of is by using it in this way...


Data Target;
Set Source;
if column = "&var1" or column = "&var2" or column = "&var3" ... column = "&varn" then do:
...
run;


Thank You Very Much,
Axe
Super User
Posts: 10,020

Re: Lookup on multiple columns

Posted in reply to deleted_user
Hi.
Not test. And this code is not suited to the dataset with large obs ,because the length of Macro variable is limited.

[pre]
proc sql;
select distinct quote( sex )
into :num separated by ' '
from sashelp.class;
quit;
Data Target;
Set Source;
if column in (&num) then do;
[/pre]



Ksharp
Super Contributor
Super Contributor
Posts: 365

Re: Lookup on multiple columns

Posted in reply to deleted_user
Hello Axe,

A do loop can be used to iterate through or clauses:
[pre]
%macro a;
data Target;
Set Source;
if column = "&var1"
%do i=1 %to &N;
or column = "&&var&i"
%end;
then do;
...
end;
run;
%mend a;

%a
[/pre]
Sincerely,
SPR
Super Contributor
Super Contributor
Posts: 3,174

Re: Lookup on multiple columns

Also, consider two other options (dependent on input data amount/size to influence performance), a hash table and also PROC FORMAT.

Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: Lookup on multiple columns

Hi Guys,

Actually the idea is...BLAHBLAH is an ID. The first digit is nothing but a prefix. However, the 2nd to 4th number may be an id for your firstname. As long any of the the 2nd, 2nd to 3rd and 2nd to 4th matches the numbers in the look up table of names, then the tag will be equal to 1.
Ask a Question
Discussion stats
  • 9 replies
  • 352 views
  • 0 likes
  • 5 in conversation