BookmarkSubscribeRSS Feed
knveraraju91
Barite | Level 11

Dear,

 

I am validating a data set that may have values similar to the data set one. When I produce a table,
the number of subjects by 'term'  variable value-it should show 4 for "bilirubin decreased' and 3 for vitamin B1 deficiency"  for this data set. But because the values are upper and lower case i will one for each 'term' value.


Is there any way I can find these similar character values to make a correction before i output table. 

Thank you

 

data one;
input id 1 term $3-40;
DATALINES;
1 bilirubin decreased
2 Bilirubin decreased
3 bilirubindecreased
4 BILIRUBIN DECREASED
5 Vitamin B1 deficiency
6 vitamin b1 deficiency
7 Vitamin B1 Deficiency
;
PROC SQL;
CREATE TABLE TWO AS
SELECT COUNT(DISTINCT ID) AS NOBS,TERM
FROM ONE
GROUP BY TERM;
QUIT;

 

 

7 REPLIES 7
Reeza
Super User

fuzzy matching 😞

I came across this paper last week, which had some really good tips and instructions on this topic:

 

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2886-2018.pdf

 

Basically, COMPGED and some other functions exist to check the 'difference' measure between text strings and then you can limit the results by filtering on the similarity scores.

 

art297
Opal | Level 21

Most of your problem could be solved by simply using the upcase function. If you actually doe have some cases like ID3 (where the difference also incorporates the using or not using embedded spaces, then compged would be an excellent choice.

 

Art, CEO, AnalystFinder.com

 

art297
Opal | Level 21

Of course, an alternative to using compged in the case of your examples, would be to just get rid of the blanks. e.g.:

PROC SQL;
  CREATE TABLE TWO AS
    SELECT COUNT(DISTINCT ID) AS NOBS,upcase(compress(TERM,' '))
      FROM ONE
        GROUP BY upcase(compress(TERM))
  ;
QUIT;

However, if there might be differences in spelling as well, then the paper @Reeza mentioned provides an excellent overview of alternatives.

 

Art, CEO, AnalystFinder.com

 

novinosrin
Tourmaline | Level 20

Hi and Good evening @art297 I'm afraid your sql will unfortunately remerge back with original to return 7 as you need an assignment to in the select clause to use in the group by clause. Just my 2 cents

 

adding distinct corrects it though, however it still will remerge and eliminate duplicates:

PROC SQL;
  CREATE TABLE TWO1 AS
    SELECT  distinct COUNT(DISTINCT ID) AS NOBS,upcase(compress(TERM,' '))
      FROM ONE
        GROUP BY upcase(compress(TERM))
  ;
QUIT;

NOTE: The query requires remerging summary statistics back with the
original data.
NOTE: Table WORK.TWO1 created, with 2 rows and 2 columns.

novinosrin
Tourmaline | Level 20
data one;
input id 1 term $3-40;
DATALINES;
1 bilirubin decreased
2 Bilirubin decreased
3 bilirubin decreased
4 BILIRUBIN DECREASED
5 Vitamin B1 deficiency
6 vitamin b1 deficiency
7 Vitamin B1 Deficiency
;

PROC SQL;
CREATE TABLE TWO AS
SELECT  upcase(term) as term,COUNT(DISTINCT ID) AS NOBS
FROM ONE
GROUP BY 1;
QUIT;
Reeza
Super User

@novinosrin your third record does not match the original data. If the space is a typo that will work, but if the record is missing the space it will not match correctly. 

 

novinosrin
Tourmaline | Level 20

@Reeza Nice catch, here is a small tweak

data one;
input id 1 term $3-40;
DATALINES;
1 bilirubin decreased
2 Bilirubin decreased
3 bilirubindecreased
4 BILIRUBIN DECREASED
5 Vitamin B1 deficiency
6 vitamin b1 deficiency
7 Vitamin B1 Deficiency
;

PROC SQL;
CREATE TABLE TWO AS
SELECT  upcase(compress(term)) as term,COUNT(DISTINCT ID) AS NOBS
FROM ONE
GROUP BY 1;
QUIT;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1863 views
  • 7 likes
  • 4 in conversation