Hi
Every one
i have data set
word count
lakshmi 10
ads 6
market 5
the 4
to 2
laks 2
is 2
what 2
and 1
help 1
book 1
How to eliminate articles,prepositions and pronous from above data set with out the word.
using base sas
plz help me.
You want to scan this dataset and remove records where the
variable “word” is an articles, prepositions or pronoun?
If the list isn't real long this is a simple way for your example data:
data have;
input word $ count;
datalines;
lakshmi 10
ads 6
market 5
the 4
to 2
laks 2
is 2
what 2
and 1
help 1
book 1
;
run;
data want;
set have;
if upcase(word) in ('THE','A','AN','I','HE','SHE','WE','IT','THEM','TO','AND') then delete;
run;
I use upcase because data could have "The" and "the". Add words as desired. If the list gets real long then creating a data set of unwanted words and a Proc SQL approach might be better.
Look up Natural Language Processing - NLP and see if you can find a list of words that would be considered articles, prepositions, nouns. Then its a simple SQL query to remove them.
Here's a list of 'stop words'
http://jmlr.org/papers/volume5/lewis04a/a11-smart-stop-list/english.stop
Thanks for all your Great help.
Hi reeza sir above link gives list words. how can move to next step. plz help.
Well, for starters it's not sir.
Read in the list from the link <- RemoveList.
Create your word list <- WordList
Remove words from WordList via SQL step:
proc sql;
create table WordList2 as
select *
from WordList
where word NOT IN (select word from RemoveList)
order by word;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.