I have some string data and I want to extract the nouns from those strings.
data text;
infile datalines delimiter=",";
length Id $5 string $100;
input Id String $;
datalines;
10, Ahmad is a student in fifth grade
11, His mother is a nurse in a hospital
12, His aunt is a kindergarten teacher
;
run;
I wrote a python code that returns noun words using nltk lbirary and passed that string to it as parameter. I executed that code using pipe command and saved the output in a sas dataset.
I don't know any function that can do this, would be interesting to maintain such a function, which, of course, needs to support multiple languages.
Maybe such a function exists in a special sas-component.
Maybe you can find a list of nouns in the www.
If you have such a list, the program could have the following steps:
data nouns_fmt;
set list(rename=(noun = start)) end=jobDone;
retain FmtName 'IsNoun' Type 'i' Hlo 'U' Label 1;
Start = upcase(start);
output;
if jobDone then do;
HLO = 'O';
Label = 0;
output;
end;
run;
proc format cntlin= nouns_fmt;
run;
Ahmed is a Proper Noun. Is there are RULE that proper nouns are not to be included?
What is the RULE that Hospital is not considered as a noun in the second observation?
What would the result be for a sentence like:
I grade papers.
Natural language processing is a pain. English may be the worst language to attempt this with because so many "nouns" are also adjectives, adverbs or verbs.
There are also problems related with separating proper nouns from nouns with the fads for "yewneek" spellings of children's names where more names are place names, drinks, occupations and other non-traditional words. Remembering of course than many current proper names are nouns or job descriptions in older languages or versions of English.
There is not any rule regarding nouns and proper nouns. I just need to get noun information for analytics. it could be both nouns and proper nouns. I extracted the nouns from an online website.
So you "just" need to look for words you already have in a dataset?
I just need to look fr words that user enter in a string and I need to extract the nouns from it.
Quote from you:
I extracted the nouns from an online website
Now, what is it? Manually entered, or retrieved from a website?
Please describe in detail(!!!) your whole process.
I get that manually from a website.
I wrote a python code that returns noun words using nltk lbirary and passed that string to it as parameter. I executed that code using pipe command and saved the output in a sas dataset.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.