BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JNWong
Calcite | Level 5

Hi,

  

   I am wondring that if sas can identify  a word which exists in the dictionary,not just created. 

   or if it can analyse the component of sentences as i want to extract the noun and delete other component such as attributes.

  the sentences include no clauses.

 

Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
rogerjdeangelis
Barite | Level 11
SAS Forum: Is it a valid word and is it a noun, adjective, pronoun..

inspired
https://goo.gl/u5muLG
https://communities.sas.com/t5/Base-SAS-Programming/can-sas-identify-a-word-or-component/m-p/325561


Two parts

1. T1001520 Is it a valid word
2. T0099390 Natural Language Processing is it a noun, adjective, pronoun..


HAVE A LIST OF WORDS IN A TEXT FILE
===================================

data _null_;
  file "d:/txt/havewords.txt";
  put 'TOMMORROW';
  put 'TOMOROW';
run;quit;


WANT
====

File: "MYWORDS"

  Unrecognized word               Freq     Line(s)

  TOMMORROW                        1       2
        Suggestions: TOMORROW

  TOMOROW                          1       3
        Suggestions: TOMORROW


SOLUTION
========

filename mywords "d:/txt/havewords.txt";
data _null_;
  file "d:/txt/havewords.txt";
  put 'TOMMORROW';
  put 'TOMOROW';
run;quit;

PROC Spell in= mywords
               verify
               suggest;
run;quit;

NOW IF YOU WANT ANOTHER DICTIONARY
===================================

go to and download
http://wordlist.sourceforge.net/

Here is  dictionary of words begining with'TOMO's

"d:/txt/tomos.txt"

WRD

TOMOGRAM
TOMOGRAMS
TOMOGRAPH
TOMOGRAPHIC
TOMOGRAPHIES
TOMOGRAPHS
TOMOGRAPHY
TOMOLO
TOMOMANIA
TOMORN
TOMORROW
TOMORROWER
TOMORROWING
TOMORROWNESS
TOMORROWS
TOMOSIS

CREATE THE DICTIONARY of 'TOMO's

PROC Spell words  = "d:/txt/tomos.txt"
           create
           dict = work.mycatalog.spell;
run;quit;

* use the dictionary with misspellings;
PROC Spell in= mywords
               verify
               suggest
               dict = work.mycatalog.spell
;
run;quit;

/* T0099390 Natural Language Processing is it a noun, adjective, pronoun..

HAVE
====

options validvarname=upcase;

data "d:/sd1/txt.sas7bdat";
  length txt $255;
  txt=catx(
     ' '
    ,'Pierre Vinken, 61 years old, will join the board as a'
    ,'nonexecutive director Nov. 29.\n'
    ,'Mr. Vinken is chairman of Elsevier N.V.,'
    ,'the Dutch publishing group.');
  putlog txt;
run;quit;

WANT  Words are tagged with frequencies
========================================

Frequencies of nouns, pronouns, verbs ...

  ,   .  CD  DT  IN  JJ  MD  NN NNP NNS  VB VBZ
  3   2   2   3   2   3   1   5   7   1   1   1

 [1] "Pierre/NNP"      "Vinken/NNP"      ",/,"             "61/CD"
 [5] "years/NNS"       "old/JJ"          ",/,"             "will/MD"
 [9] "join/VB"         "the/DT"          "board/NN"        "as/IN"
[13] "a/DT"            "nonexecutive/JJ" "director/NN"     "Nov./NNP"
[17] "29/CD"           "./."             "Mr./NNP"         "Vinken/NNP"
[21] "is/VBZ"          "chairman/NN"     "of/IN"           "Elsevier/NNP"
[25] "N.V./NNP"        ",/,"             "the/DT"          "Dutch/JJ"
[29] "publishing/NN"   "group/NN"        "./."


CC     Coordinating conjunction
CD     Cardinal number
DT     Determiner
EX     Existential there
FW     Foreign word
IN     Preposition or subordinating conjunction
JJ     Adjective
JJR    Adjective, comparative
JJS    Adjective, superlative
LS     List item marker
MD     Modal
NN     Noun, singular or mass
NNS    Noun, plural
NNP    Proper noun, singular
NNPS   Proper noun, plural
PDT    Predeterminer
POS    Possessive ending
PRP    Personal pronoun
PRP$   Possessive pronoun
RB     Adverb
RBR    Adverb, comparative
RBS    Adverb, superlative
RP     Particle
SYM    Symbol
UH     Interjection
VB     Verb, base form
VBD    Verb, past tense
VBG    Verb, gerund or present participle
VBN    Verb, past participle
VBP    Verb, non­3rd person singular present
VBZ    Verb, 3rd person singular present
WDT    Wh­determiner
WP     Wh­pronoun
WP$    Possessive wh­pronoun
WRB    Wh­adverb

SOLUTION

%utl_submit_r64(
library(stringr);
library(NLP);
library(openNLP);
library(openNLPmodels.en);
library(haven);
txt<-read_sas('d:/sd1/txt.sas7bdat');
txt;
s <- as.String(txt$TXT);
sent_token_annotator <- Maxent_Sent_Token_Annotator();
word_token_annotator <- Maxent_Word_Token_Annotator();
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator));
pos_tag_annotator <- Maxent_POS_Tag_Annotator();
pos_tag_annotator;
a3 <- annotate(s, pos_tag_annotator, a2);
a3;
head(annotate(s, Maxent_POS_Tag_Annotator(probs = TRUE), a2));
a3w <- subset(a3, type == 'word');
tags <- sapply(a3w$features, `[[`, 'POS');
tags;
table(tags);
sprintf('%s/%s', s[a3w], tags);
);

View solution in original post

6 REPLIES 6
Shmuel
Garnet | Level 18

Within SAS as programming tool you can analyze any text.

I don't know is there a ready SAS system to do what you want and

even if there is - it should be programmed speciffically for the language

you are interested in.

 

Have you ever used Google Translate ? - if yes, then you know that analyzing text

and translating it to another language (that is transformaing from one language grammar to another)

is very conplicated and not very accurate.

JNWong
Calcite | Level 5

Thank you I  agree with what you said. 

Reeza
Super User

Are you working with Base SAS or EM with Text Analytics?

JNWong
Calcite | Level 5

sas base.

ballardw
Super User

You will have to supply the logic for determining if a word is a noun or not if may be a noun, verb or even proper name.

rogerjdeangelis
Barite | Level 11
SAS Forum: Is it a valid word and is it a noun, adjective, pronoun..

inspired
https://goo.gl/u5muLG
https://communities.sas.com/t5/Base-SAS-Programming/can-sas-identify-a-word-or-component/m-p/325561


Two parts

1. T1001520 Is it a valid word
2. T0099390 Natural Language Processing is it a noun, adjective, pronoun..


HAVE A LIST OF WORDS IN A TEXT FILE
===================================

data _null_;
  file "d:/txt/havewords.txt";
  put 'TOMMORROW';
  put 'TOMOROW';
run;quit;


WANT
====

File: "MYWORDS"

  Unrecognized word               Freq     Line(s)

  TOMMORROW                        1       2
        Suggestions: TOMORROW

  TOMOROW                          1       3
        Suggestions: TOMORROW


SOLUTION
========

filename mywords "d:/txt/havewords.txt";
data _null_;
  file "d:/txt/havewords.txt";
  put 'TOMMORROW';
  put 'TOMOROW';
run;quit;

PROC Spell in= mywords
               verify
               suggest;
run;quit;

NOW IF YOU WANT ANOTHER DICTIONARY
===================================

go to and download
http://wordlist.sourceforge.net/

Here is  dictionary of words begining with'TOMO's

"d:/txt/tomos.txt"

WRD

TOMOGRAM
TOMOGRAMS
TOMOGRAPH
TOMOGRAPHIC
TOMOGRAPHIES
TOMOGRAPHS
TOMOGRAPHY
TOMOLO
TOMOMANIA
TOMORN
TOMORROW
TOMORROWER
TOMORROWING
TOMORROWNESS
TOMORROWS
TOMOSIS

CREATE THE DICTIONARY of 'TOMO's

PROC Spell words  = "d:/txt/tomos.txt"
           create
           dict = work.mycatalog.spell;
run;quit;

* use the dictionary with misspellings;
PROC Spell in= mywords
               verify
               suggest
               dict = work.mycatalog.spell
;
run;quit;

/* T0099390 Natural Language Processing is it a noun, adjective, pronoun..

HAVE
====

options validvarname=upcase;

data "d:/sd1/txt.sas7bdat";
  length txt $255;
  txt=catx(
     ' '
    ,'Pierre Vinken, 61 years old, will join the board as a'
    ,'nonexecutive director Nov. 29.\n'
    ,'Mr. Vinken is chairman of Elsevier N.V.,'
    ,'the Dutch publishing group.');
  putlog txt;
run;quit;

WANT  Words are tagged with frequencies
========================================

Frequencies of nouns, pronouns, verbs ...

  ,   .  CD  DT  IN  JJ  MD  NN NNP NNS  VB VBZ
  3   2   2   3   2   3   1   5   7   1   1   1

 [1] "Pierre/NNP"      "Vinken/NNP"      ",/,"             "61/CD"
 [5] "years/NNS"       "old/JJ"          ",/,"             "will/MD"
 [9] "join/VB"         "the/DT"          "board/NN"        "as/IN"
[13] "a/DT"            "nonexecutive/JJ" "director/NN"     "Nov./NNP"
[17] "29/CD"           "./."             "Mr./NNP"         "Vinken/NNP"
[21] "is/VBZ"          "chairman/NN"     "of/IN"           "Elsevier/NNP"
[25] "N.V./NNP"        ",/,"             "the/DT"          "Dutch/JJ"
[29] "publishing/NN"   "group/NN"        "./."


CC     Coordinating conjunction
CD     Cardinal number
DT     Determiner
EX     Existential there
FW     Foreign word
IN     Preposition or subordinating conjunction
JJ     Adjective
JJR    Adjective, comparative
JJS    Adjective, superlative
LS     List item marker
MD     Modal
NN     Noun, singular or mass
NNS    Noun, plural
NNP    Proper noun, singular
NNPS   Proper noun, plural
PDT    Predeterminer
POS    Possessive ending
PRP    Personal pronoun
PRP$   Possessive pronoun
RB     Adverb
RBR    Adverb, comparative
RBS    Adverb, superlative
RP     Particle
SYM    Symbol
UH     Interjection
VB     Verb, base form
VBD    Verb, past tense
VBG    Verb, gerund or present participle
VBN    Verb, past participle
VBP    Verb, non­3rd person singular present
VBZ    Verb, 3rd person singular present
WDT    Wh­determiner
WP     Wh­pronoun
WP$    Possessive wh­pronoun
WRB    Wh­adverb

SOLUTION

%utl_submit_r64(
library(stringr);
library(NLP);
library(openNLP);
library(openNLPmodels.en);
library(haven);
txt<-read_sas('d:/sd1/txt.sas7bdat');
txt;
s <- as.String(txt$TXT);
sent_token_annotator <- Maxent_Sent_Token_Annotator();
word_token_annotator <- Maxent_Word_Token_Annotator();
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator));
pos_tag_annotator <- Maxent_POS_Tag_Annotator();
pos_tag_annotator;
a3 <- annotate(s, pos_tag_annotator, a2);
a3;
head(annotate(s, Maxent_POS_Tag_Annotator(probs = TRUE), a2));
a3w <- subset(a3, type == 'word');
tags <- sapply(a3w$features, `[[`, 'POS');
tags;
table(tags);
sprintf('%s/%s', s[a3w], tags);
);

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1688 views
  • 0 likes
  • 5 in conversation