BookmarkSubscribeRSS Feed
RFLinnenbank
Calcite | Level 5
I've got two tables, A and B.
In table A there is a field filled with complete names.(A)
In table B there is a field filled with partial names.(B)

Both fields are in upper case (UPCASE).
In the query I call for the entire fieldset (*) of Table A, where field (A) contains field (B). (It doesn't matter which one, so there isn't any JOIN)

This yields no results in some of the cases where I can clearly see they should.
When I make the same comparison using only the fields (A) and (B) as their respective tables, this does work however.

Would anybody know the reason for this?
6 REPLIES 6
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
It's important to share your code for good feedback.

Scott Barry
SBBWorks, Inc.
RFLinnenbank
Calcite | Level 5
Here it is. Did have to replace the names though (confidentiality issue):
Table1.naam = Partial Name belonging to an adress
Table2.naam = Full name (originally can contain ' or ", hence COMPRESS)
&SOURCETABLE = reference to source table name. (working of course)

*** The next 2 queries set up the 2 tables as wel as formats NAAM to upper case ***;
PROC SQL;
CREATE TABLE TUSSENDT.TABLE1 AS
SELECT SOURCE.GROEP,
UPCASE(SOURCE.NAAM) AS NAAM,
SOURCE.HUISNR,
SOURCE.POSTCODE
FROM &SOURCETABLE AS SOURCE
WHERE SOURCE.GROEP NOT IS NULL;
QUIT;

PROC SQL;
CREATE TABLE TUSSENDT.TABLE2 AS
SELECT UPCASE(COMPRESS(COMPRESS(TABLE2SOURCE.NAAM,''''),'"')) AS NAAM FORMAT=$200.,
TABLE2SOURCE.ID AS ID FORMAT=11.,
TABLE2SOURCE.HUISNR AS HUISNR FORMAT=$6.,
TABLE2SOURCE.HUISNRTOE AS HUISNRTOE FORMAT=$12.,
TABLE2SOURCE.POSTCODE AS POSTCODE FORMAT=$6.,
TABLE2SOURCE.BUITENLAND_IND AS BUITENLAND_IND FORMAT=$1.,
TABLE2SOURCE.BEKEND_DAT AS BEKEND_DAT FORMAT=MMDDYY8.
FROM SOURCE.TABLE2SOURCE AS TABLE2SOURCE;
QUIT;


*** Checks if TABLE2.naam contains a TABLE1.naam, adds GROEP and NAAM from TABLE1 ***;

PROC SQL;
CREATE TABLE WORK.KOI_INFO_OP_NAAM AS SELECT DISTINCT TABLE2.ID FORMAT=11.,
TABLE1.NAAM FORMAT=$200. AS NAAM_1,
TABLE2.NAAM FORMAT=$200. AS NAAM_2,
TABLE2.HUISNR FORMAT=$6.,
TABLE2.HUISNRTOE AS HUISN2 FORMAT=$12.,
TABLE2.POSTCODE AS POSTCO FORMAT=$6.,
TABLE2.BUITENLAND_IND AS BUITEN FORMAT=$1.,
TABLE2.BEKEND_DAT AS BEKEND FORMAT=MMDDYY8.,
TABLE1.GROEP,
('OP NAAM') AS OORSPRONG
FROM TUSSENDT.TABLE2 AS TABLE2,
TUSSENDT.TABLE2 AS TABLE2
WHERE TABLE2.NAAM Contains TABLE1.NAAM;
QUIT; - As extra information: I'm using SAS 9.1.3


Message was edited by: RFLinnenbank
LinusH
Tourmaline | Level 20
I did a simplified example using SASHELP.CLASS:

14 /* Infoga anpassad kod före skickad kod här */
15 options msglevel=i;
16
17 data ClassPartname;
18 set sashelp.class;
19 length PartName $2;
20 PartName = substr(name,2,2);
21 keep Partname;
22 run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASSPARTNAME has 19 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds


23
24 proc sql;
25 create table ContainsJoin as
26 select class.*, PartName
27 from sashelp.class as class,
28 ClassPartName
29 where class.Name contains ClassPartName.PartName
30 ;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
NOTE: Table WORK.CONTAINSJOIN created, with 31 rows and 6 columns.

31 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
To help you out more, please share the input data to the "join", and any LOG information that could be of interest.

/Linus Message was edited by: Linus H
Data never sleeps
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
You can leave off the FORMAT= keyword for character variables. Also, you must have an incorrect FROM in your last PROC SQL query (there is no TABLE1 reference). I suspect that you need a change to CONTAINS TRIM(TABLE1.NAAM), adding the TRIM function, otherwise (at least with a DATA step), the padded variable length (LENGTH, not FORMAT) is used which could result in no-match if there is trailing blanks.

And, so, you will need to dump your TABLE1 values for NAAM to attempt to correlate them against a "compressed" TABLE2.NAAM.

Scott Barry
SBBWorks, Inc.
RFLinnenbank
Calcite | Level 5
Sorry about that, misplaced that one while replacing the naming of the original query.

As far as the trailing blanks are concerned, wouldn't be surprised if those were the cause of it once again. The "SAS fills character-types up with blanks"-thing rears it's ugly head every once in a while, just when you least expect it.

I'll let you know as soon as possible if your suggestions did the trick. I expect so though.

Thanks a lot, even if this wouldn't quite do the trick yet!
RFLinnenbank
Calcite | Level 5
Turns out it was indeed once again a Trimming issue!
Works like a charm now!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 885 views
  • 0 likes
  • 3 in conversation