BookmarkSubscribeRSS Feed
aaou
Obsidian | Level 7

Hi,

I have two datasets:

 ds1

Advisor
JP_Morgan
JP_Morgan_Chase___Co
JP_Morgan_Secs__Asia
JP_Morgan_Securities
Lehman_Brothers
Lehman_Brothers_Kuhn
Laidlaw___Co__UK__Lt
Laidlaw_Adams___Peck
Laidlaw_Ansbacher
Laidlaw_Equities_Inc
Macquarie_Capital__U
Macquarie_Capital_Ad
Macquarie_Capital_Gr
Macquarie_Capital_Pa

 

ds2

Advisor
JP_Morgan
Laidlaw
Macquarie_Capital

Lehman_Brothers

 

 

 

When I run the code below using soundex and compged ( I set a threshold level of 100 for the compged) I get the outcome given below the code: 

 

data advisors;
set ds1;
tmp1=soundex(advisor);
do i=1 to nobs;
set ds2(rename=(advisor=advisor2)) point=i nobs=nobs;
tmp2=soundex(advisor2);
dif=compged(tmp1,tmp2);
if dif<=100 then do;
possible_match='Yes';
drop i tmp1 tmp2;
output;
end;
end;
run;

 

Advisoradvisor2difpossible_match
JP_MorganJP_Morgan0Yes
Lehman_BrothersLehman_Brothers0Yes
Lehman_Brothers_KuhnLehman_Brothers70Yes
Macquarie_Capital__UMacquarie_Capital0Yes
Macquarie_Capital_AdMacquarie_Capital50Yes
Macquarie_Capital_GrMacquarie_Capital100Yes
Macquarie_Capital_PaMacquarie_Capital50

Yes

 

However, say I want an output table which gives me all the 14 observations in ds1, gives 'yes' for possible match and the respective match from ds2 in the second column if the compegd is below the threshold and gives 'no' for possible match and a blank for the second column if the threshold is surpassed. 

 

Basically I want to know how each observation in ds1 fairs with the matching and not to lose the observations because there wasn't a close match. 

 

Thank you!

7 REPLIES 7
PGStats
Opal | Level 21

Add a bit more code:

 

data advisors;
length possible_match $3;
set ds1;
tmp1=soundex(advisor);
possible_match='No';
do i=1 to nobs;
	set ds2(rename=(advisor=advisor2)) point=i nobs=nobs;
	tmp2=soundex(advisor2);
	dif=compged(tmp1,tmp2);
	if dif<=100 then do;
		possible_match='Yes';
		drop i tmp1 tmp2;
		output;
		end;
	end;
if possible_match='No' then do;
	call missing(advisor2);
	output;
	end;
run;
PG
aaou
Obsidian | Level 7

Thanks a lot!

 

I got a few questions:

 1) Suppose instead of returning blank values ( when the compegd score is greater than the threshold value) I want the original name in DS1 to appear instead of the blank value; how should the code be amended?

 

2) I noticed that when I change the threshold score from 100 to say 500, the compegd scores change completely....why is that?

 

3) what is the meaning of this part of the code

length possible_match $3;    

 

I'm new to coding so I truly appreciate your help! Thank you.

 

PGStats
Opal | Level 21

1) Replace 

call missing(advisor2);

with

advisor2 = advisor; 

 

2) I don't know. You could submit this as a new question to the Forum.

 

3) Unless given explicitly, SAS determines the length of a character variable at the first assignment encountered during compilation. The length statement is to ensure that the variable possible_match isn't given a length of only two when the variable is first assigned the value "No".

PG
aaou
Obsidian | Level 7

Thank you.

 

Regarding your answer to questions (1); I thought that would work too, but when I do that the output I get generates blanks for the first column of advisors and the table gets messed up. I have attached a screenshot of the output. 


sas.png
PGStats
Opal | Level 21

Please post the new code.

PG
aaou
Obsidian | Level 7
data advisors;
length possible_match $3;
set ds1;
tmp1=soundex(advisor);
possible_match='No';
do i=1 to nobs;
	set ds2(rename=(advisor=advisor2)) point=i nobs=nobs;
	tmp2=soundex(advisor2);
	dif=compged(tmp1,tmp2);
	if dif<=100 then do;
		possible_match='Yes';
		drop i tmp1 tmp2;
		output;
		end;
	end;
if possible_match='No' then do;
	call missing(advisor);
	output;
	end;
run;

 

I also tried this as I wasn't quite sure:

 

data advisors;
length possible_match $3;
set ds1;
tmp1=soundex(advisor);
possible_match='No';
do i=1 to nobs;
	set ds2(rename=(advisor=advisor2)) point=i nobs=nobs;
	tmp2=soundex(advisor2);
	dif=compged(tmp1,tmp2);
	if dif<=100 then do;
		possible_match='Yes';
		drop i tmp1 tmp2;
		output;
		end;
	end;
if possible_match='No' then do;
	call missing(advisor2=advisor);
	output;
	end;
run;

Both didn't return what I wanted. 😞 

PGStats
Opal | Level 21

My suggestion should have led to:

 

data advisors;
length possible_match $3;
set ds1;
tmp1 = soundex(advisor);
possible_match = 'No';
do i = 1 to nobs;
	set ds2(rename = (advisor = advisor2)) point = i nobs = nobs;
	tmp2 = soundex(advisor2);
	dif = compged(tmp1, tmp2);
	if dif <= 100 then do;
		possible_match = 'Yes';
		output;
		end;
	end;
if possible_match = 'No' then do;
	advisor2 = advisor;
	output;
	end;
drop i tmp1 tmp2;
run;
PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2479 views
  • 0 likes
  • 2 in conversation