BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I found a paper about table look ups and tried to apply it. Here is the code.


DATA TEST;
LENGTH NAME $ 7;
INFILE DATALINES;
INPUT ID NAME $;
DATALINES;
11 WILLIE
1 MAUS
129 JER
113 JOE
2 POOCH
;
RUN;

DATA MATCH;
INFILE DATALINES;
INPUT CITY $ ID BLAHBLA $;
DATALINES;
MARIKINA 1 1111SJD9U
MAKATI 2 1119IU9
MANILA 3 1129I3KCQ
SMART 6 1113IIDJ8A
;
RUN;


PROC SQL NOPRINT;
SELECT DISTINCT ID INTO :ID_LIST SEPERATED BY ','
FROM TEST;
QUIT;

%LET ID_LIST=&ID_LIST;

DATA LOOKUP;
SET MATCH;
IF SUBSTR(BLAHBLA,1,1) = '1' THEN DO;
STRING1=SUBSTR(BLAHBLA,2,1);
IF STRING1 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE DO;
STRING2=SUBSTR(BLAHBLA,2,2);
IF STRING2 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE DO;
STRING3=SUBSTR(BLAHBLA,2,3);
IF STRING3 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE 'NON-MATCH';
END;
END;
ELSE 'NOT POSSIBLE';
END;
RUN;

Is there another way in order to simplify this? Also the lookup tables is updating which holds the problem of restriction since, macro variables can hold up to 32k characters only. The idea is to perform lookup on multiple calculated columns.

Thank You Very Much,
Wil
9 REPLIES 9
Daryl
SAS Employee
Matching on a partial string seems like a potentially error-prone approach. Is there no way to delimit the ID in BLAHBLA ?
deleted_user
Not applicable
Sorry there's no way to delimit the values for Blahbla. However, the values for blahblah and id are all numeric and also it holds that after the prefix the next 3 digits are looked up. The 1st digit first, the 1st and 2nd digit next, and lastly, the 1st to 3rd digits.


DATA TEST;
LENGTH NAME $ 7;
INFILE DATALINES;
INPUT ID NAME $;
DATALINES;
11 WIL
1 MAU
129 JEROME
113 JOEY
2 POOCHIE
;
RUN;

DATA MATCH;
INFILE DATALINES;
INPUT CITY $ BLAHBLA $;
DATALINES;
MARIKINA 111123456
MAKATI 111925788
MANILA 112912468
SMART 111312356
;
RUN;


PROC SQL NOPRINT;
SELECT DISTINCT ID INTO :ID_LIST SEPERATED BY ','
FROM TEST;
QUIT;

%LET ID_LIST = &ID_LIST

DATA LOOKUP;
SET MATCH;
IF SUBSTR(BLAHBLA,1,1) = '1' THEN DO;
STRING1=SUBSTR(BLAHBLA,2,1);
IF STRING1 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE DO;
STRING2=SUBSTR(BLAHBLA,2,2);
IF STRING2 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE DO;
STRING3=SUBSTR(BLAHBLA,2,3);
IF STRING3 IN (&ID_LIST) THEN TAG = 'MATCH';
ELSE 'NON-MATCH';
END;
END;
ELSE 'NOT POSSIBLE';
END;
RUN;
Daryl
SAS Employee
The approach seems like it still may generate false positives.
For example, does 11 WIL match to MARIKINA, MAKATI, or SMART, or potentially none of these? Do you have any more facts about BLAHBLA that you could share?
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Axe,

It seems to me that using an array of macro variables could help to solve the problem of "32k" but the program becomes more complicated. I mean using
[pre]
proc SQL;
select count(Distinct ID) :into N from test;
%let N=%TRIM(&N);
select distinct ID into :ID1-:ID&N from TEST;
quit;
[/pre]
with subsequent cycle on ID1 to ID&N.

Sincerely,
SPR
deleted_user
Not applicable
Hi SPR,

Can you provide me an example on how to use it? All I think of is by using it in this way...


Data Target;
Set Source;
if column = "&var1" or column = "&var2" or column = "&var3" ... column = "&varn" then do:
...
run;


Thank You Very Much,
Axe
Ksharp
Super User
Hi.
Not test. And this code is not suited to the dataset with large obs ,because the length of Macro variable is limited.

[pre]
proc sql;
select distinct quote( sex )
into :num separated by ' '
from sashelp.class;
quit;
Data Target;
Set Source;
if column in (&num) then do;
[/pre]



Ksharp
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Axe,

A do loop can be used to iterate through or clauses:
[pre]
%macro a;
data Target;
Set Source;
if column = "&var1"
%do i=1 %to &N;
or column = "&&var&i"
%end;
then do;
...
end;
run;
%mend a;

%a
[/pre]
Sincerely,
SPR
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Also, consider two other options (dependent on input data amount/size to influence performance), a hash table and also PROC FORMAT.

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
Hi Guys,

Actually the idea is...BLAHBLAH is an ID. The first digit is nothing but a prefix. However, the 2nd to 4th number may be an id for your firstname. As long any of the the 2nd, 2nd to 3rd and 2nd to 4th matches the numbers in the look up table of names, then the tag will be equal to 1.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3426 views
  • 0 likes
  • 5 in conversation