Hi all,
I have two tables
as follows:
Table A
City ZIP Code(s)
Alabama (AL)Huntsville 35801
Alaska (AK)Anchorage 99501
Arizona (AZ)Phoenix 85001
Arkansas (AR)Little Rock 72201
California (CA)Sacramento 94203
Los Angeles 90001
Beverly Hills 90209
Colorado (CO)Denver 80201
Conneticut (CT)Hartford 06101
Deleware (DE)Dover 19901
District of Columbia 20001
Table B
Zip subcity
35801 alabama
35801 albama
35801 alabam
35801 hills1
35801 hills2
67980 hills3
67980 hills
67980 hills
90001 los
90001 albama
90001 rtff
80076 hartford
85001 Arizona
39801 alabama
I need to extract subcity from city of table A using look up within zip.
so that final output would be something like this
City ZIP Code(s) ZIP SUBcity
Alabama (AL)Huntsville 35801 35801 alabama
Alaska (AK)Anchorage 99501
Arizona (AZ)Phoenix 85001 85001 Arizona
Arkansas (AR)Little Rock 72201
California (CA)Sacrament 94203
Los Angeles 90001 90001 los
Beverly Hills 90209
Colorado (CO)Denver 80201
Conneticut (CT)Hartford 06101
Deleware (DE)Dover 19901
District of Columbia 20001
I have used left join using where condition of zip=ZIP Code(s)
AND CITY CONTAINS SUBCITY
BUT EXECUTION TIME IS VERY SLOW.
Is there any way to achieve result more efficiently?
Also,can we use a fuzzy search so that we have result like this as well
City ZIP Code(s) ZIP SUBcity
Alabama (AL)Huntsville 35801 35801 alabama
Alabama (AL)Huntsville 35801 35801 albama
Alabama (AL)Huntsville 35801 35801 alabam
Thanks a lot !!!!!!!!!
Thanks for your response .
The part displayed as part in example has data in certain pattern where' ( 'can be used to delimit .
But This is not the actual data as actual data does not has any fixed pattern and somehow I need to first index my subcity in city within a zip and if condition holds true then need to pull out subcity.
It would probably help to have the entire Proc SQL code you attempted.
Your "execution is very slow" might be improved by coding changes but need to see the whole thing you are using.
For example it seems likely your choice of " where condition of zip=ZIP Code(s)" might be better done as a JOIN ON of some flavor.
1. Merging on equality (equi-joins) is the fastest you can merge. If that's not fast enough you need to presort the tables.
2. As soon as you do fuzzy joins, performance plummets.
So the course of action would be:
1. Sort the tables by ZIP
2. Merge on ZIP equality and SUBCITY equal to the start of city (use scan() for the first word if long enough or scan() using the parentheses as delimiter or operator =: , or all these successively)
3. What's hasn't been matched can be retried with other criteria including fuzzy ones, like using the function compged() . The cost get higher but the volumes get smaller.
You missed the essence of my reply:
To speed up the matches AND increase the match rate, perform successive matches rather than try to do it all in one go.
> Can you please elaborate more on how to carry out successive matches.
Not too sure whats unclear.
So the course of action would be:
1. Sort the tables by ZIP
2. Merge on ZIP equality and SUBCITY equal to the start of city (use scan() for the first word if long enough or scan() using the parentheses as delimiter or operator =: , or all these successively)
3. What's hasn't been matched can be retried with other criteria including fuzzy ones, like using the function compged() . The cost get higher but the volumes get smaller.
1. Join the easily found matches using an obvious criterion like ZIP equality and SUBCITY = first word => function scan()
2. Join the unmatched data on a less direct criterion like ZIP equality and SUBCITY = any word => function index()
3. Repeat the process for unmatched data until satisfied: the volume to match goes down as the criterion increases in fuzziness.
4. When finished, append the successive matches. It is a good idea to keep track of the match method so the data includes some sort of match-quality score.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.