BookmarkSubscribeRSS Feed
sas_lak
Quartz | Level 8

Hi All,

In my first table i have list of millions of names

and in my secod table i have hundreds of names like below.

Need to find out from first table where ever names contains from second table with or without lead or trail blanks.

First TableSecond Table
IKON INTERNATIONALCUSTOMERS
JOHN RICE MILL ( PVT ) LTDELECTRONICS
GLOBAL EQUIPMENTSDRUGS
LIFE MEDICAREENTERPIRSES
SELF HELP GROUPINTERNATIONAL
IRFAN DAIRY( PVT ) LTD
NICCO CORPN LTD LC DEVOLVEDLTD
H P INDUSTRIESPVT
FOURTUNE GARMENTS LTDELECTRONIC
J B MARKETINGGROUP
M V AND COMPANYINDUSTRIES
ROYAL AGRO FOODS INDUSTRIES
WEBSOL ENERGY SYSTEM LIMITED
COMPETITIVE  EDGE CONSULTANTS
FOOD SHOPPERET.TR.
P P ELECTRONIC
LOVELY MEDICAL HALL
RASSON ENERGY PVT LTD


8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, you have a few options, but millions of rows is going to have an impact.  Basically you need to merge (join hash etc.) the data somehow.  If I was doing it myself on smaller data then:

proc sql;

     create table WANT as

     select     A.*

     from        FIRST A

     left join   SECOND B

     on           A.VAR like B.VAR

     where    B.VAR is not null;

quit;

However, proc sql isn't the best on resources, so maybe a datastep (or someone can provide a hash):

data _null_;

     set second end=last;

     if _n_=1 then call execute('data want; set first; if var contains "'||strip(var)||'"');

     else do call execute(' or var contains "'||strip(var)||'"');

     if last then call execute(' then output; run;');

run;

This will generate a datastep with the list of values in an if statement.

Ksharp
Super User

CODE NOT TESTED.

proc sql;

create table want as

select *

  from first as a,second as b

   where a.name contains strip(b.name);

quit;

Xia Keshan

sas_lak
Quartz | Level 8

Thank you,

Actually here my requirement is

1. Name does not contain numbers, Selected Special characters.

2. Does not contain S/O, D/O, W/O and C/O .....

3. Does not Single Token

4. At least one token having minimum 3 characters

5. Does not contain selected names (as mentioned above in Table 2)

Here I need to run the 5 points in a single step and create invalid name dataset.

Developed the code as per above requirement but i have confusion to get selected names.

In my code for selected names I wrote like this.

NAME Contains "ACADEMY" or

NAME Contains "ACCCURED" or

NAME Contains "ACCESSORIES" or

NAME Contains "ACCOUNT" or

.

.

.

NAME Contains "ZINC"

When I am running the same on millions of records it is taking huge time and systems are hanging.

Thanks in Advance….!

SASKiwi
PROC Star

How many selected names do you have? You could do want you want without table joins by either format lookup or hash lookup as long as the list of selected names isn't as huge.

sas_lak
Quartz | Level 8

In my second table (selected names) 2K + and in my first table more than 5 million records(this is depending on cycle wise)

here need to find out how many selected names are available from first table.

SASKiwi
PROC Star

OK 2K isn't so big so I would apply all of the exclusions first, in order of the most likely to occur as this will speed things up.

I suggest not writing out the exclusions to a separate file, at least not in the same run, to save time.

Then as already mentioned I would do either a format or hash look up on NAME to find occurrences of the required words.

Try fine-tuning this on a sample of your data until you are happy with the performance, then try it over the full volume.

Kurt_Bremser
Super User

Well, a complex where condition executed over a large dataset will always take time and consume resources. If not, you could do it by hand using pen&paper. Just kidding.

The only thing you can optimize is to reduce the amount of work you have to do. 's solution is a nice way to go about it.

sas_lak
Quartz | Level 8

When I used

NAME Contains " ACADEMY " or

it is ignoring the Leading and trail blanks

For Ex:-

1. "CRICKETACADEMY"

2."CRICKETACADEMY UK"

3."CRICKETACADEMYUK"

4."CRICKET ACADEMYUK"

5."CRICKET ACADEMY UK"

after running the code am getting 5. "CRICKET ACADEMY UK" only

 

Insted of this

NAME Contains " ACADEMY "

we have four possible patterns like this (with lead and, or trail blanks)

NAME Contains "ACADEMY" or

NAME Contains " ACADEMY" or

NAME Contains " ACADEMY " or

NAME Contains "ACADEMY "


but it is taking number of iterations.


Thank you.


SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2114 views
  • 3 likes
  • 5 in conversation