Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

How to eliminate articles,prepositions and pronous from data

Reply
Contributor
Posts: 40

How to eliminate articles,prepositions and pronous from data

Hi

Every one

     

i have data set

    word        count

   lakshmi     10

   ads            6

   market       5

   the            4

   to              2

   laks           2

   is              2

   what         2

   and           1

   help          1

   book         1

How to eliminate articles,prepositions and pronous from above data set with out the word.

using base sas

plz help me.

SAS Employee
Posts: 10

Re: How to eliminate articles,prepositions and pronous from data

You want to scan this dataset and remove records where the
variable “word” is an articles, prepositions or pronoun?

Super User
Posts: 10,511

Re: How to eliminate articles,prepositions and pronous from data

If the list isn't real long this is a simple way for your example data:

data have;
input word $    count;
datalines;
lakshmi     10
ads            6
market       5
the            4
to              2
laks           2
is              2
what         2
and           1
help          1
book         1
;
run;

data want;
   set have;
   if upcase(word) in ('THE','A','AN','I','HE','SHE','WE','IT','THEM','TO','AND') then delete;
run;

I use upcase because data could have "The" and "the". Add words as desired. If the list gets real long then creating a data set of unwanted words and a Proc SQL approach might be better.

Super User
Posts: 17,842

Re: How to eliminate articles,prepositions and pronous from data

Look up Natural Language Processing - NLP and see if you can find a list of words that would be considered articles, prepositions, nouns. Then its a simple SQL query to remove them.

Here's a list of 'stop words'

http://jmlr.org/papers/volume5/lewis04a/a11-smart-stop-list/english.stop

Contributor
Posts: 40

Re: How to eliminate articles,prepositions and pronous from data

Thanks  for all your Great help.

Hi reeza sir  above link gives list words. how can move to next step. plz help.

Super User
Posts: 17,842

Re: How to eliminate articles,prepositions and pronous from data

Well, for starters it's not sir.

Read in the list from the link <- RemoveList.

Create your word list <- WordList

Remove words from WordList via SQL step:

proc sql;

create table WordList2 as

select *

from WordList

where word NOT IN (select word from RemoveList)

order by word;

quit;

Ask a Question
Discussion stats
  • 5 replies
  • 568 views
  • 0 likes
  • 4 in conversation