BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
harrylui
Obsidian | Level 7

good day,

 

I am using verify function to check similarity of my data. but the result looks odd to me.

Obs name lag_name checking
1 Harry   1
2 Harry      Harry 6
3 Henry    Harry    2
4 Ben      Henry   1

 

seems blank also count one character

is the sequence matter here because

 

obs 3 should got 3 match word 

obs 4 should got 2 match word 

 

can anyone help me on that?

 

below is my program


data testing;
input name $40.;
infile datalines dlm=',';
datalines;
Harry
Harry     
Henry   
Ben     
;
run;

data testing2;
set testing;
lag_name=compress(lag(name));
name2=compress(name);
run;


data checking;
set testing2;
checking=VERIFY(name2,lag_name);
run;

 

thanks in advance

harry

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @harrylui 

 

Here is the output I get when I run the code using the VERIFY() function:

Capture d’écran 2020-03-02 à 09.50.39.png

 

This function returns the position of the first character in the string that is not in any of several other strings, so the results seems good:

- H compared to a blank in first position

- Second record -> strings are strictly identical so no position is returned (0)

- e compared to a in second position

- B compared to a blank in first position

 

Depending on what you're trying to achieve, maybe you could explore the COMPGED() function. This function returns the generalized edit distance between two strings, i.e. the largest the distance is, the less similar the strings are. It is up to you to determine the threshold to determine "acceptable similarity". Zero indicates that strings are strictly identical.

data checking;
set testing2;
checking=compged(name2,lag_name);
run;

Capture d’écran 2020-03-02 à 09.55.06.png

 

 

 

 

 

Hope this helps,

 

Best,

View solution in original post

3 REPLIES 3
ed_sas_member
Meteorite | Level 14

Hi @harrylui 

 

Here is the output I get when I run the code using the VERIFY() function:

Capture d’écran 2020-03-02 à 09.50.39.png

 

This function returns the position of the first character in the string that is not in any of several other strings, so the results seems good:

- H compared to a blank in first position

- Second record -> strings are strictly identical so no position is returned (0)

- e compared to a in second position

- B compared to a blank in first position

 

Depending on what you're trying to achieve, maybe you could explore the COMPGED() function. This function returns the generalized edit distance between two strings, i.e. the largest the distance is, the less similar the strings are. It is up to you to determine the threshold to determine "acceptable similarity". Zero indicates that strings are strictly identical.

data checking;
set testing2;
checking=compged(name2,lag_name);
run;

Capture d’écran 2020-03-02 à 09.55.06.png

 

 

 

 

 

Hope this helps,

 

Best,

akanshya142
Calcite | Level 5

The verify function will check the location where your string1 mismatches string2.

So in your example

Henry and Harry the letter H matches with H, but the 2nd letter e does not match with a. So the verify function returns 2.

For Ben and Henry, the 1st letter itself does not match, so the function returns 1.

harrylui
Obsidian | Level 7

thanks all for the explanations 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 854 views
  • 3 likes
  • 3 in conversation