Text mining and content categorization

Searching for 2, 3, 4 character ACRONYMS within a large document

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

Searching for 2, 3, 4 character ACRONYMS within a large document

HI All, 

I have base SAS, E.G., SAS Studio, and 

Need to search a large document for all acronyms (2, 3, 4 character), isolate them, and create a table of definitions.

The acronyms can be any character like "IND", "mg", 'USP", etc...

Is there a SAS function , or combination of functions, that will count the number of characters in a word,  and provide a YES/NO (1/0) answer if a certain number of characters are present?  

Thanks


Accepted Solutions
Solution
‎07-15-2017 10:36 PM
Super User
Posts: 5,093

Re: Searching for 2, 3, 4 character ACRONYMS within a large document

It's easy to count the number of characters in a word:

 

n_characters = lengthn(varname);

 

I'm not sure you gain anything by translating this into a YES/NO.  You can easily select a subset that you want:

 

if n_characters = 3;

 

or

 

if 0 < n_characters < 3;

 

or anything else that you would select.  Possibly:

 

proc freq data=have;

tables varname;

where (0 < lengthn(varname) < 3);

run;

View solution in original post


All Replies
Solution
‎07-15-2017 10:36 PM
Super User
Posts: 5,093

Re: Searching for 2, 3, 4 character ACRONYMS within a large document

It's easy to count the number of characters in a word:

 

n_characters = lengthn(varname);

 

I'm not sure you gain anything by translating this into a YES/NO.  You can easily select a subset that you want:

 

if n_characters = 3;

 

or

 

if 0 < n_characters < 3;

 

or anything else that you would select.  Possibly:

 

proc freq data=have;

tables varname;

where (0 < lengthn(varname) < 3);

run;

Occasional Contributor
Posts: 11

Re: Searching for 2, 3, 4 character ACRONYMS within a large document

This works like a charm, Very intersting, Thanks!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 225 views
  • 0 likes
  • 2 in conversation