Hi All,
In my first table i have list of millions of names
and in my secod table i have hundreds of names like below.
Need to find out from first table where ever names contains from second table with or without lead or trail blanks.
First Table | Second Table |
IKON INTERNATIONAL | CUSTOMERS |
JOHN RICE MILL ( PVT ) LTD | ELECTRONICS |
GLOBAL EQUIPMENTS | DRUGS |
LIFE MEDICARE | ENTERPIRSES |
SELF HELP GROUP | INTERNATIONAL |
IRFAN DAIRY | ( PVT ) LTD |
NICCO CORPN LTD LC DEVOLVED | LTD |
H P INDUSTRIES | PVT |
FOURTUNE GARMENTS LTD | ELECTRONIC |
J B MARKETING | GROUP |
M V AND COMPANY | INDUSTRIES |
ROYAL AGRO FOODS INDUSTRIES | |
WEBSOL ENERGY SYSTEM LIMITED | |
COMPETITIVE EDGE CONSULTANTS | |
FOOD SHOPPERET.TR. | |
P P ELECTRONIC | |
LOVELY MEDICAL HALL | |
RASSON ENERGY PVT LTD |
Well, you have a few options, but millions of rows is going to have an impact. Basically you need to merge (join hash etc.) the data somehow. If I was doing it myself on smaller data then:
proc sql;
create table WANT as
select A.*
from FIRST A
left join SECOND B
on A.VAR like B.VAR
where B.VAR is not null;
quit;
However, proc sql isn't the best on resources, so maybe a datastep (or someone can provide a hash):
data _null_;
set second end=last;
if _n_=1 then call execute('data want; set first; if var contains "'||strip(var)||'"');
else do call execute(' or var contains "'||strip(var)||'"');
if last then call execute(' then output; run;');
run;
This will generate a datastep with the list of values in an if statement.
CODE NOT TESTED.
proc sql;
create table want as
select *
from first as a,second as b
where a.name contains strip(b.name);
quit;
Xia Keshan
Thank you,
Actually here my requirement is
1. Name does not contain numbers, Selected Special characters.
2. Does not contain S/O, D/O, W/O and C/O .....
3. Does not Single Token
4. At least one token having minimum 3 characters
5. Does not contain selected names (as mentioned above in Table 2)
Here I need to run the 5 points in a single step and create invalid name dataset.
Developed the code as per above requirement but i have confusion to get selected names.
In my code for selected names I wrote like this.
NAME Contains "ACADEMY" or
NAME Contains "ACCCURED" or
NAME Contains "ACCESSORIES" or
NAME Contains "ACCOUNT" or
.
.
.
NAME Contains "ZINC"
When I am running the same on millions of records it is taking huge time and systems are hanging.
Thanks in Advance….!
How many selected names do you have? You could do want you want without table joins by either format lookup or hash lookup as long as the list of selected names isn't as huge.
In my second table (selected names) 2K + and in my first table more than 5 million records(this is depending on cycle wise)
here need to find out how many selected names are available from first table.
OK 2K isn't so big so I would apply all of the exclusions first, in order of the most likely to occur as this will speed things up.
I suggest not writing out the exclusions to a separate file, at least not in the same run, to save time.
Then as already mentioned I would do either a format or hash look up on NAME to find occurrences of the required words.
Try fine-tuning this on a sample of your data until you are happy with the performance, then try it over the full volume.
When I used
NAME Contains " ACADEMY " or
it is ignoring the Leading and trail blanks
For Ex:-
1. "CRICKETACADEMY"
2."CRICKETACADEMY UK"
3."CRICKETACADEMYUK"
4."CRICKET ACADEMYUK"
5."CRICKET ACADEMY UK"
after running the code am getting 5. "CRICKET ACADEMY UK" only
Insted of this
NAME Contains " ACADEMY "
we have four possible patterns like this (with lead and, or trail blanks)
NAME Contains "ACADEMY" or
NAME Contains " ACADEMY" or
NAME Contains " ACADEMY " or
NAME Contains "ACADEMY "
but it is taking number of iterations.
Thank you.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.