Hi,
I have two datasets:
ds1
Advisor |
JP_Morgan |
JP_Morgan_Chase___Co |
JP_Morgan_Secs__Asia |
JP_Morgan_Securities |
Lehman_Brothers |
Lehman_Brothers_Kuhn |
Laidlaw___Co__UK__Lt |
Laidlaw_Adams___Peck |
Laidlaw_Ansbacher |
Laidlaw_Equities_Inc |
Macquarie_Capital__U |
Macquarie_Capital_Ad |
Macquarie_Capital_Gr |
Macquarie_Capital_Pa |
ds2
Advisor |
JP_Morgan |
Laidlaw |
Macquarie_Capital |
Lehman_Brothers
|
When I run the code below using soundex and compged ( I set a threshold level of 100 for the compged) I get the outcome given below the code:
data advisors;
set ds1;
tmp1=soundex(advisor);
do i=1 to nobs;
set ds2(rename=(advisor=advisor2)) point=i nobs=nobs;
tmp2=soundex(advisor2);
dif=compged(tmp1,tmp2);
if dif<=100 then do;
possible_match='Yes';
drop i tmp1 tmp2;
output;
end;
end;
run;
Advisor | advisor2 | dif | possible_match |
JP_Morgan | JP_Morgan | 0 | Yes |
Lehman_Brothers | Lehman_Brothers | 0 | Yes |
Lehman_Brothers_Kuhn | Lehman_Brothers | 70 | Yes |
Macquarie_Capital__U | Macquarie_Capital | 0 | Yes |
Macquarie_Capital_Ad | Macquarie_Capital | 50 | Yes |
Macquarie_Capital_Gr | Macquarie_Capital | 100 | Yes |
Macquarie_Capital_Pa | Macquarie_Capital | 50 | Yes |
However, say I want an output table which gives me all the 14 observations in ds1, gives 'yes' for possible match and the respective match from ds2 in the second column if the compegd is below the threshold and gives 'no' for possible match and a blank for the second column if the threshold is surpassed.
Basically I want to know how each observation in ds1 fairs with the matching and not to lose the observations because there wasn't a close match.
Thank you!
Add a bit more code:
data advisors;
length possible_match $3;
set ds1;
tmp1=soundex(advisor);
possible_match='No';
do i=1 to nobs;
set ds2(rename=(advisor=advisor2)) point=i nobs=nobs;
tmp2=soundex(advisor2);
dif=compged(tmp1,tmp2);
if dif<=100 then do;
possible_match='Yes';
drop i tmp1 tmp2;
output;
end;
end;
if possible_match='No' then do;
call missing(advisor2);
output;
end;
run;
Thanks a lot!
I got a few questions:
1) Suppose instead of returning blank values ( when the compegd score is greater than the threshold value) I want the original name in DS1 to appear instead of the blank value; how should the code be amended?
2) I noticed that when I change the threshold score from 100 to say 500, the compegd scores change completely....why is that?
3) what is the meaning of this part of the code
length possible_match $3;
I'm new to coding so I truly appreciate your help! Thank you.
1) Replace
call missing(advisor2);
with
advisor2 = advisor;
2) I don't know. You could submit this as a new question to the Forum.
3) Unless given explicitly, SAS determines the length of a character variable at the first assignment encountered during compilation. The length statement is to ensure that the variable possible_match isn't given a length of only two when the variable is first assigned the value "No".
Thank you.
Regarding your answer to questions (1); I thought that would work too, but when I do that the output I get generates blanks for the first column of advisors and the table gets messed up. I have attached a screenshot of the output.
Please post the new code.
data advisors;
length possible_match $3;
set ds1;
tmp1=soundex(advisor);
possible_match='No';
do i=1 to nobs;
set ds2(rename=(advisor=advisor2)) point=i nobs=nobs;
tmp2=soundex(advisor2);
dif=compged(tmp1,tmp2);
if dif<=100 then do;
possible_match='Yes';
drop i tmp1 tmp2;
output;
end;
end;
if possible_match='No' then do;
call missing(advisor);
output;
end;
run;
I also tried this as I wasn't quite sure:
data advisors;
length possible_match $3;
set ds1;
tmp1=soundex(advisor);
possible_match='No';
do i=1 to nobs;
set ds2(rename=(advisor=advisor2)) point=i nobs=nobs;
tmp2=soundex(advisor2);
dif=compged(tmp1,tmp2);
if dif<=100 then do;
possible_match='Yes';
drop i tmp1 tmp2;
output;
end;
end;
if possible_match='No' then do;
call missing(advisor2=advisor);
output;
end;
run;
Both didn't return what I wanted. 😞
My suggestion should have led to:
data advisors;
length possible_match $3;
set ds1;
tmp1 = soundex(advisor);
possible_match = 'No';
do i = 1 to nobs;
set ds2(rename = (advisor = advisor2)) point = i nobs = nobs;
tmp2 = soundex(advisor2);
dif = compged(tmp1, tmp2);
if dif <= 100 then do;
possible_match = 'Yes';
output;
end;
end;
if possible_match = 'No' then do;
advisor2 = advisor;
output;
end;
drop i tmp1 tmp2;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.