BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
cheema11
Calcite | Level 5

Hi All,

 

New to SAS, need some help in string matching. I have Table A which have millions of records. Table B has less than 5K records.

I want to match Address field of table A with Keywords field of table B. Currently doing Proc SQL with Like operation which is very slow and query takes ages. There is a cross join between both the tables. Both tables are in SAS.

Wanted to know the efficient way of doing this. Using SAS EG V 8.2.

 

It will be a great help. Thanks in advance.

 

 

SAS Data.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Thanks @PeterClemmensen for asking all the questions.

@cheema11 Please provide next time sample data via a working SAS data step so we don't have to spend time to create such data in order to provide tested code.

 

I believe below should do what you're asking for. It's still a resource intensive process but certainly better than a cartesian join.

data master;
  id=_n_;
  infile datalines truncover;
  input address $40.;
  datalines;
alias_address: abc@swy.com.au
alias_address: pa@baxa.com
alias_address: baxa@test.com
alias_address: 123456789@baxa.com
;

data lookup;
  infile datalines truncover;
  input keyword $ category $;
  datalines;
A swy
A baxa
B 12345678
C test
;

data lookup;
  set lookup;
  _str_len=length(category);
run;

proc sort data=lookup;
  by descending _str_len;
run;

data want(drop=_:);
  set master;
  _r_len=-1;
  do _i=1 to _nobs;
    set lookup point=_i nobs=_nobs;
    if _r_len>_str_len then leave;
    if find(address,category,'it') then
      do;
        output;
        _r_len=_str_len;
      end;
  end;
run;

proc print data=want;
run;

 

Patrick_0-1664184308051.png

 

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

Are these both SAS Data Sets?

cheema11
Calcite | Level 5
Yes both are sas data sets.
PeterClemmensen
Tourmaline | Level 20

Ok. What if a string in table a matches multiple substrings in table b?

cheema11
Calcite | Level 5
Return the one with longest string length. If string length is same, then return any one of them.
PeterClemmensen
Tourmaline | Level 20

The longest string from table b, right?

cheema11
Calcite | Level 5
Yes longest string from Table B
Patrick
Opal | Level 21

Thanks @PeterClemmensen for asking all the questions.

@cheema11 Please provide next time sample data via a working SAS data step so we don't have to spend time to create such data in order to provide tested code.

 

I believe below should do what you're asking for. It's still a resource intensive process but certainly better than a cartesian join.

data master;
  id=_n_;
  infile datalines truncover;
  input address $40.;
  datalines;
alias_address: abc@swy.com.au
alias_address: pa@baxa.com
alias_address: baxa@test.com
alias_address: 123456789@baxa.com
;

data lookup;
  infile datalines truncover;
  input keyword $ category $;
  datalines;
A swy
A baxa
B 12345678
C test
;

data lookup;
  set lookup;
  _str_len=length(category);
run;

proc sort data=lookup;
  by descending _str_len;
run;

data want(drop=_:);
  set master;
  _r_len=-1;
  do _i=1 to _nobs;
    set lookup point=_i nobs=_nobs;
    if _r_len>_str_len then leave;
    if find(address,category,'it') then
      do;
        output;
        _r_len=_str_len;
      end;
  end;
run;

proc print data=want;
run;

 

Patrick_0-1664184308051.png

 

cheema11
Calcite | Level 5
Thanks for the response on this. Really appreciate for the help. @PeterClemmensen @Patrick
cheema11
Calcite | Level 5
Hi @Patrick.
In response to your query...."What if a string in table A matches multiple substrings in table B?" My answer was "Return the one with longest string length in Table B. If string length is same, then return any one of them."

What do i need to change in the script if I want to return the string with shortest length in Table B?
Patrick
Opal | Level 21

@cheema11 wrote:
Hi @Patrick.
In response to your query...."What if a string in table A matches multiple substrings in table B?" My answer was "Return the one with longest string length in Table B. If string length is same, then return any one of them."

What do i need to change in the script if I want to return the string with shortest length in Table B?

@PeterClemmensen did all the work getting the full requirements from you.

 

Patrick_0-1664781929579.png

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1379 views
  • 0 likes
  • 3 in conversation