hi,
i have 2 tables which i need to match by name but the match is not 100% and the names are in both tables.
I specifically investigated those names that did not give me the match and they have a different length in both tables.
The ones that match have the same length .I pulled a few of the names that don't give me a match from both tables and seems like they are bad in on the datasets The first table would show me an output such as name="Anna Smith",the other table breaks the output in 2 lines name='Anna
Smith'
and they have different length . I thought it was a space issue, so i tried compress ,STRIP,trim... u name it but nothing worked.
Any solution that would fix this issue,anyone please?
PS.If i go back to the excel files these 2 datasets are created of and re-enter those names manually and re run everything it matches perfectly but don't think that would be a solution on a long term 🙂
Thanks
One step for preparing the names for matching could be to replace all white-space characters with a blank and to upcase the string.
Below line of code is doing this. Consecutive white-space characters will get replaced by a single blank.
nameMatch=upcase(prxchange('s/\s+/ /oi',-1,strip(name)));
data test;
name=cats('Anna','0a'x,'09'x,'Smith');
nameMatch=upcase(prxchange('s/\s+/ /oi',-1,strip(name)));
name_Hex=put(strip(name),hex.);
nameMatch_Hex=put(strip(nameMatch),hex.);
output;
stop;
run;
proc print;
run;
Check and compare both sides names using hex value:
put(compress(name),$hex50.);
the length (50) should be, at least, twice the supposed length.
In your example "Anna Smith" length is 10 then use $hex24. or more.
Coming from excel it may be a result of break-line in a cell or some other control character.
thanks for your input Shmuel,tried it and still same issue. Got to be a control character. Asked business to fix their excel file
Thanks anyway guys,
Regards
T
One step for preparing the names for matching could be to replace all white-space characters with a blank and to upcase the string.
Below line of code is doing this. Consecutive white-space characters will get replaced by a single blank.
nameMatch=upcase(prxchange('s/\s+/ /oi',-1,strip(name)));
data test;
name=cats('Anna','0a'x,'09'x,'Smith');
nameMatch=upcase(prxchange('s/\s+/ /oi',-1,strip(name)));
name_Hex=put(strip(name),hex.);
nameMatch_Hex=put(strip(nameMatch),hex.);
output;
stop;
run;
proc print;
run;
You should also be doing other checks. Is name unique on both files? If not how do you know if you are joining the right Anna Smith on one file with the right Anna Smith on the other? Can you use other columns to improve joining accuracy?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.