Hi,
I am wondring that if sas can identify a word which exists in the dictionary,not just created.
or if it can analyse the component of sentences as i want to extract the noun and delete other component such as attributes.
the sentences include no clauses.
Thank you!
SAS Forum: Is it a valid word and is it a noun, adjective, pronoun..
inspired
https://goo.gl/u5muLG
https://communities.sas.com/t5/Base-SAS-Programming/can-sas-identify-a-word-or-component/m-p/325561
Two parts
1. T1001520 Is it a valid word
2. T0099390 Natural Language Processing is it a noun, adjective, pronoun..
HAVE A LIST OF WORDS IN A TEXT FILE
===================================
data _null_;
file "d:/txt/havewords.txt";
put 'TOMMORROW';
put 'TOMOROW';
run;quit;
WANT
====
File: "MYWORDS"
Unrecognized word Freq Line(s)
TOMMORROW 1 2
Suggestions: TOMORROW
TOMOROW 1 3
Suggestions: TOMORROW
SOLUTION
========
filename mywords "d:/txt/havewords.txt";
data _null_;
file "d:/txt/havewords.txt";
put 'TOMMORROW';
put 'TOMOROW';
run;quit;
PROC Spell in= mywords
verify
suggest;
run;quit;
NOW IF YOU WANT ANOTHER DICTIONARY
===================================
go to and download
http://wordlist.sourceforge.net/
Here is dictionary of words begining with'TOMO's
"d:/txt/tomos.txt"
WRD
TOMOGRAM
TOMOGRAMS
TOMOGRAPH
TOMOGRAPHIC
TOMOGRAPHIES
TOMOGRAPHS
TOMOGRAPHY
TOMOLO
TOMOMANIA
TOMORN
TOMORROW
TOMORROWER
TOMORROWING
TOMORROWNESS
TOMORROWS
TOMOSIS
CREATE THE DICTIONARY of 'TOMO's
PROC Spell words = "d:/txt/tomos.txt"
create
dict = work.mycatalog.spell;
run;quit;
* use the dictionary with misspellings;
PROC Spell in= mywords
verify
suggest
dict = work.mycatalog.spell
;
run;quit;
/* T0099390 Natural Language Processing is it a noun, adjective, pronoun..
HAVE
====
options validvarname=upcase;
data "d:/sd1/txt.sas7bdat";
length txt $255;
txt=catx(
' '
,'Pierre Vinken, 61 years old, will join the board as a'
,'nonexecutive director Nov. 29.\n'
,'Mr. Vinken is chairman of Elsevier N.V.,'
,'the Dutch publishing group.');
putlog txt;
run;quit;
WANT Words are tagged with frequencies
========================================
Frequencies of nouns, pronouns, verbs ...
, . CD DT IN JJ MD NN NNP NNS VB VBZ
3 2 2 3 2 3 1 5 7 1 1 1
[1] "Pierre/NNP" "Vinken/NNP" ",/," "61/CD"
[5] "years/NNS" "old/JJ" ",/," "will/MD"
[9] "join/VB" "the/DT" "board/NN" "as/IN"
[13] "a/DT" "nonexecutive/JJ" "director/NN" "Nov./NNP"
[17] "29/CD" "./." "Mr./NNP" "Vinken/NNP"
[21] "is/VBZ" "chairman/NN" "of/IN" "Elsevier/NNP"
[25] "N.V./NNP" ",/," "the/DT" "Dutch/JJ"
[29] "publishing/NN" "group/NN" "./."
CC Coordinating conjunction
CD Cardinal number
DT Determiner
EX Existential there
FW Foreign word
IN Preposition or subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNS Noun, plural
NNP Proper noun, singular
NNPS Proper noun, plural
PDT Predeterminer
POS Possessive ending
PRP Personal pronoun
PRP$ Possessive pronoun
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non3rd person singular present
VBZ Verb, 3rd person singular present
WDT Whdeterminer
WP Whpronoun
WP$ Possessive whpronoun
WRB Whadverb
SOLUTION
%utl_submit_r64(
library(stringr);
library(NLP);
library(openNLP);
library(openNLPmodels.en);
library(haven);
txt<-read_sas('d:/sd1/txt.sas7bdat');
txt;
s <- as.String(txt$TXT);
sent_token_annotator <- Maxent_Sent_Token_Annotator();
word_token_annotator <- Maxent_Word_Token_Annotator();
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator));
pos_tag_annotator <- Maxent_POS_Tag_Annotator();
pos_tag_annotator;
a3 <- annotate(s, pos_tag_annotator, a2);
a3;
head(annotate(s, Maxent_POS_Tag_Annotator(probs = TRUE), a2));
a3w <- subset(a3, type == 'word');
tags <- sapply(a3w$features, `[[`, 'POS');
tags;
table(tags);
sprintf('%s/%s', s[a3w], tags);
);
Within SAS as programming tool you can analyze any text.
I don't know is there a ready SAS system to do what you want and
even if there is - it should be programmed speciffically for the language
you are interested in.
Have you ever used Google Translate ? - if yes, then you know that analyzing text
and translating it to another language (that is transformaing from one language grammar to another)
is very conplicated and not very accurate.
Thank you I agree with what you said.
Are you working with Base SAS or EM with Text Analytics?
sas base.
You will have to supply the logic for determining if a word is a noun or not if may be a noun, verb or even proper name.
SAS Forum: Is it a valid word and is it a noun, adjective, pronoun..
inspired
https://goo.gl/u5muLG
https://communities.sas.com/t5/Base-SAS-Programming/can-sas-identify-a-word-or-component/m-p/325561
Two parts
1. T1001520 Is it a valid word
2. T0099390 Natural Language Processing is it a noun, adjective, pronoun..
HAVE A LIST OF WORDS IN A TEXT FILE
===================================
data _null_;
file "d:/txt/havewords.txt";
put 'TOMMORROW';
put 'TOMOROW';
run;quit;
WANT
====
File: "MYWORDS"
Unrecognized word Freq Line(s)
TOMMORROW 1 2
Suggestions: TOMORROW
TOMOROW 1 3
Suggestions: TOMORROW
SOLUTION
========
filename mywords "d:/txt/havewords.txt";
data _null_;
file "d:/txt/havewords.txt";
put 'TOMMORROW';
put 'TOMOROW';
run;quit;
PROC Spell in= mywords
verify
suggest;
run;quit;
NOW IF YOU WANT ANOTHER DICTIONARY
===================================
go to and download
http://wordlist.sourceforge.net/
Here is dictionary of words begining with'TOMO's
"d:/txt/tomos.txt"
WRD
TOMOGRAM
TOMOGRAMS
TOMOGRAPH
TOMOGRAPHIC
TOMOGRAPHIES
TOMOGRAPHS
TOMOGRAPHY
TOMOLO
TOMOMANIA
TOMORN
TOMORROW
TOMORROWER
TOMORROWING
TOMORROWNESS
TOMORROWS
TOMOSIS
CREATE THE DICTIONARY of 'TOMO's
PROC Spell words = "d:/txt/tomos.txt"
create
dict = work.mycatalog.spell;
run;quit;
* use the dictionary with misspellings;
PROC Spell in= mywords
verify
suggest
dict = work.mycatalog.spell
;
run;quit;
/* T0099390 Natural Language Processing is it a noun, adjective, pronoun..
HAVE
====
options validvarname=upcase;
data "d:/sd1/txt.sas7bdat";
length txt $255;
txt=catx(
' '
,'Pierre Vinken, 61 years old, will join the board as a'
,'nonexecutive director Nov. 29.\n'
,'Mr. Vinken is chairman of Elsevier N.V.,'
,'the Dutch publishing group.');
putlog txt;
run;quit;
WANT Words are tagged with frequencies
========================================
Frequencies of nouns, pronouns, verbs ...
, . CD DT IN JJ MD NN NNP NNS VB VBZ
3 2 2 3 2 3 1 5 7 1 1 1
[1] "Pierre/NNP" "Vinken/NNP" ",/," "61/CD"
[5] "years/NNS" "old/JJ" ",/," "will/MD"
[9] "join/VB" "the/DT" "board/NN" "as/IN"
[13] "a/DT" "nonexecutive/JJ" "director/NN" "Nov./NNP"
[17] "29/CD" "./." "Mr./NNP" "Vinken/NNP"
[21] "is/VBZ" "chairman/NN" "of/IN" "Elsevier/NNP"
[25] "N.V./NNP" ",/," "the/DT" "Dutch/JJ"
[29] "publishing/NN" "group/NN" "./."
CC Coordinating conjunction
CD Cardinal number
DT Determiner
EX Existential there
FW Foreign word
IN Preposition or subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNS Noun, plural
NNP Proper noun, singular
NNPS Proper noun, plural
PDT Predeterminer
POS Possessive ending
PRP Personal pronoun
PRP$ Possessive pronoun
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non3rd person singular present
VBZ Verb, 3rd person singular present
WDT Whdeterminer
WP Whpronoun
WP$ Possessive whpronoun
WRB Whadverb
SOLUTION
%utl_submit_r64(
library(stringr);
library(NLP);
library(openNLP);
library(openNLPmodels.en);
library(haven);
txt<-read_sas('d:/sd1/txt.sas7bdat');
txt;
s <- as.String(txt$TXT);
sent_token_annotator <- Maxent_Sent_Token_Annotator();
word_token_annotator <- Maxent_Word_Token_Annotator();
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator));
pos_tag_annotator <- Maxent_POS_Tag_Annotator();
pos_tag_annotator;
a3 <- annotate(s, pos_tag_annotator, a2);
a3;
head(annotate(s, Maxent_POS_Tag_Annotator(probs = TRUE), a2));
a3w <- subset(a3, type == 'word');
tags <- sapply(a3w$features, `[[`, 'POS');
tags;
table(tags);
sprintf('%s/%s', s[a3w], tags);
);
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.