Solved: can sas identify a word or component

JNWong · Posted 01-18-2017 12:22 AM

Hi,

I am wondring that if sas can identify a word which exists in the dictionary,not just created.

or if it can analyse the component of sentences as i want to extract the noun and delete other component such as attributes.

the sentences include no clauses.

Thank you!

rogerjdeangelis · Posted 01-18-2017 04:10 PM

SAS Forum: Is it a valid word and is it a noun, adjective, pronoun..

inspired
https://goo.gl/u5muLG
https://communities.sas.com/t5/Base-SAS-Programming/can-sas-identify-a-word-or-component/m-p/325561


Two parts

1. T1001520 Is it a valid word
2. T0099390 Natural Language Processing is it a noun, adjective, pronoun..


HAVE A LIST OF WORDS IN A TEXT FILE
===================================

data _null_;
  file "d:/txt/havewords.txt";
  put 'TOMMORROW';
  put 'TOMOROW';
run;quit;


WANT
====

File: "MYWORDS"

  Unrecognized word               Freq     Line(s)

  TOMMORROW                        1       2
        Suggestions: TOMORROW

  TOMOROW                          1       3
        Suggestions: TOMORROW


SOLUTION
========

filename mywords "d:/txt/havewords.txt";
data _null_;
  file "d:/txt/havewords.txt";
  put 'TOMMORROW';
  put 'TOMOROW';
run;quit;

PROC Spell in= mywords
               verify
               suggest;
run;quit;

NOW IF YOU WANT ANOTHER DICTIONARY
===================================

go to and download
http://wordlist.sourceforge.net/

Here is  dictionary of words begining with'TOMO's

"d:/txt/tomos.txt"

WRD

TOMOGRAM
TOMOGRAMS
TOMOGRAPH
TOMOGRAPHIC
TOMOGRAPHIES
TOMOGRAPHS
TOMOGRAPHY
TOMOLO
TOMOMANIA
TOMORN
TOMORROW
TOMORROWER
TOMORROWING
TOMORROWNESS
TOMORROWS
TOMOSIS

CREATE THE DICTIONARY of 'TOMO's

PROC Spell words  = "d:/txt/tomos.txt"
           create
           dict = work.mycatalog.spell;
run;quit;

* use the dictionary with misspellings;
PROC Spell in= mywords
               verify
               suggest
               dict = work.mycatalog.spell
;
run;quit;

/* T0099390 Natural Language Processing is it a noun, adjective, pronoun..

HAVE
====

options validvarname=upcase;

data "d:/sd1/txt.sas7bdat";
  length txt $255;
  txt=catx(
     ' '
    ,'Pierre Vinken, 61 years old, will join the board as a'
    ,'nonexecutive director Nov. 29.\n'
    ,'Mr. Vinken is chairman of Elsevier N.V.,'
    ,'the Dutch publishing group.');
  putlog txt;
run;quit;

WANT  Words are tagged with frequencies
========================================

Frequencies of nouns, pronouns, verbs ...

  ,   .  CD  DT  IN  JJ  MD  NN NNP NNS  VB VBZ
  3   2   2   3   2   3   1   5   7   1   1   1

 [1] "Pierre/NNP"      "Vinken/NNP"      ",/,"             "61/CD"
 [5] "years/NNS"       "old/JJ"          ",/,"             "will/MD"
 [9] "join/VB"         "the/DT"          "board/NN"        "as/IN"
[13] "a/DT"            "nonexecutive/JJ" "director/NN"     "Nov./NNP"
[17] "29/CD"           "./."             "Mr./NNP"         "Vinken/NNP"
[21] "is/VBZ"          "chairman/NN"     "of/IN"           "Elsevier/NNP"
[25] "N.V./NNP"        ",/,"             "the/DT"          "Dutch/JJ"
[29] "publishing/NN"   "group/NN"        "./."


CC     Coordinating conjunction
CD     Cardinal number
DT     Determiner
EX     Existential there
FW     Foreign word
IN     Preposition or subordinating conjunction
JJ     Adjective
JJR    Adjective, comparative
JJS    Adjective, superlative
LS     List item marker
MD     Modal
NN     Noun, singular or mass
NNS    Noun, plural
NNP    Proper noun, singular
NNPS   Proper noun, plural
PDT    Predeterminer
POS    Possessive ending
PRP    Personal pronoun
PRP$   Possessive pronoun
RB     Adverb
RBR    Adverb, comparative
RBS    Adverb, superlative
RP     Particle
SYM    Symbol
UH     Interjection
VB     Verb, base form
VBD    Verb, past tense
VBG    Verb, gerund or present participle
VBN    Verb, past participle
VBP    Verb, non3rd person singular present
VBZ    Verb, 3rd person singular present
WDT    Whdeterminer
WP     Whpronoun
WP$    Possessive whpronoun
WRB    Whadverb

SOLUTION

%utl_submit_r64(
library(stringr);
library(NLP);
library(openNLP);
library(openNLPmodels.en);
library(haven);
txt<-read_sas('d:/sd1/txt.sas7bdat');
txt;
s <- as.String(txt$TXT);
sent_token_annotator <- Maxent_Sent_Token_Annotator();
word_token_annotator <- Maxent_Word_Token_Annotator();
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator));
pos_tag_annotator <- Maxent_POS_Tag_Annotator();
pos_tag_annotator;
a3 <- annotate(s, pos_tag_annotator, a2);
a3;
head(annotate(s, Maxent_POS_Tag_Annotator(probs = TRUE), a2));
a3w <- subset(a3, type == 'word');
tags <- sapply(a3w$features, `[[`, 'POS');
tags;
table(tags);
sprintf('%s/%s', s[a3w], tags);
);

View solution in original post

Shmuel · Posted 01-18-2017 12:54 AM

Within SAS as programming tool you can analyze any text.

I don't know is there a ready SAS system to do what you want and

even if there is - it should be programmed speciffically for the language

you are interested in.

Have you ever used Google Translate ? - if yes, then you know that analyzing text

and translating it to another language (that is transformaing from one language grammar to another)

is very conplicated and not very accurate.

JNWong · Posted 01-18-2017 01:32 AM

Thank you I agree with what you said.

Reeza · Posted 01-18-2017 04:01 AM

Are you working with Base SAS or EM with Text Analytics?

JNWong · Posted 01-18-2017 04:03 AM

sas base.

ballardw · Posted 01-18-2017 09:56 AM

You will have to supply the logic for determining if a word is a noun or not if may be a noun, verb or even proper name.

rogerjdeangelis · Posted 01-18-2017 04:10 PM

SAS Forum: Is it a valid word and is it a noun, adjective, pronoun..

inspired
https://goo.gl/u5muLG
https://communities.sas.com/t5/Base-SAS-Programming/can-sas-identify-a-word-or-component/m-p/325561


Two parts

1. T1001520 Is it a valid word
2. T0099390 Natural Language Processing is it a noun, adjective, pronoun..


HAVE A LIST OF WORDS IN A TEXT FILE
===================================

data _null_;
  file "d:/txt/havewords.txt";
  put 'TOMMORROW';
  put 'TOMOROW';
run;quit;


WANT
====

File: "MYWORDS"

  Unrecognized word               Freq     Line(s)

  TOMMORROW                        1       2
        Suggestions: TOMORROW

  TOMOROW                          1       3
        Suggestions: TOMORROW


SOLUTION
========

filename mywords "d:/txt/havewords.txt";
data _null_;
  file "d:/txt/havewords.txt";
  put 'TOMMORROW';
  put 'TOMOROW';
run;quit;

PROC Spell in= mywords
               verify
               suggest;
run;quit;

NOW IF YOU WANT ANOTHER DICTIONARY
===================================

go to and download
http://wordlist.sourceforge.net/

Here is  dictionary of words begining with'TOMO's

"d:/txt/tomos.txt"

WRD

TOMOGRAM
TOMOGRAMS
TOMOGRAPH
TOMOGRAPHIC
TOMOGRAPHIES
TOMOGRAPHS
TOMOGRAPHY
TOMOLO
TOMOMANIA
TOMORN
TOMORROW
TOMORROWER
TOMORROWING
TOMORROWNESS
TOMORROWS
TOMOSIS

CREATE THE DICTIONARY of 'TOMO's

PROC Spell words  = "d:/txt/tomos.txt"
           create
           dict = work.mycatalog.spell;
run;quit;

* use the dictionary with misspellings;
PROC Spell in= mywords
               verify
               suggest
               dict = work.mycatalog.spell
;
run;quit;

/* T0099390 Natural Language Processing is it a noun, adjective, pronoun..

HAVE
====

options validvarname=upcase;

data "d:/sd1/txt.sas7bdat";
  length txt $255;
  txt=catx(
     ' '
    ,'Pierre Vinken, 61 years old, will join the board as a'
    ,'nonexecutive director Nov. 29.\n'
    ,'Mr. Vinken is chairman of Elsevier N.V.,'
    ,'the Dutch publishing group.');
  putlog txt;
run;quit;

WANT  Words are tagged with frequencies
========================================

Frequencies of nouns, pronouns, verbs ...

  ,   .  CD  DT  IN  JJ  MD  NN NNP NNS  VB VBZ
  3   2   2   3   2   3   1   5   7   1   1   1

 [1] "Pierre/NNP"      "Vinken/NNP"      ",/,"             "61/CD"
 [5] "years/NNS"       "old/JJ"          ",/,"             "will/MD"
 [9] "join/VB"         "the/DT"          "board/NN"        "as/IN"
[13] "a/DT"            "nonexecutive/JJ" "director/NN"     "Nov./NNP"
[17] "29/CD"           "./."             "Mr./NNP"         "Vinken/NNP"
[21] "is/VBZ"          "chairman/NN"     "of/IN"           "Elsevier/NNP"
[25] "N.V./NNP"        ",/,"             "the/DT"          "Dutch/JJ"
[29] "publishing/NN"   "group/NN"        "./."


CC     Coordinating conjunction
CD     Cardinal number
DT     Determiner
EX     Existential there
FW     Foreign word
IN     Preposition or subordinating conjunction
JJ     Adjective
JJR    Adjective, comparative
JJS    Adjective, superlative
LS     List item marker
MD     Modal
NN     Noun, singular or mass
NNS    Noun, plural
NNP    Proper noun, singular
NNPS   Proper noun, plural
PDT    Predeterminer
POS    Possessive ending
PRP    Personal pronoun
PRP$   Possessive pronoun
RB     Adverb
RBR    Adverb, comparative
RBS    Adverb, superlative
RP     Particle
SYM    Symbol
UH     Interjection
VB     Verb, base form
VBD    Verb, past tense
VBG    Verb, gerund or present participle
VBN    Verb, past participle
VBP    Verb, non3rd person singular present
VBZ    Verb, 3rd person singular present
WDT    Whdeterminer
WP     Whpronoun
WP$    Possessive whpronoun
WRB    Whadverb

SOLUTION

%utl_submit_r64(
library(stringr);
library(NLP);
library(openNLP);
library(openNLPmodels.en);
library(haven);
txt<-read_sas('d:/sd1/txt.sas7bdat');
txt;
s <- as.String(txt$TXT);
sent_token_annotator <- Maxent_Sent_Token_Annotator();
word_token_annotator <- Maxent_Word_Token_Annotator();
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator));
pos_tag_annotator <- Maxent_POS_Tag_Annotator();
pos_tag_annotator;
a3 <- annotate(s, pos_tag_annotator, a2);
a3;
head(annotate(s, Maxent_POS_Tag_Annotator(probs = TRUE), a2));
a3w <- subset(a3, type == 'word');
tags <- sapply(a3w$features, `[[`, 'POS');
tags;
table(tags);
sprintf('%s/%s', s[a3w], tags);
);

can sas identify a word or component

Re: can sas identify a word or component

Re: can sas identify a word or component

Re: can sas identify a word or component

Re: can sas identify a word or component

Re: can sas identify a word or component

Re: can sas identify a word or component

Re: can sas identify a word or component

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away