BookmarkSubscribeRSS Feed
tlnarayana26
Calcite | Level 5

Hi

Every one

     

i have data set

    word        count

   lakshmi     10

   ads            6

   market       5

   the            4

   to              2

   laks           2

   is              2

   what         2

   and           1

   help          1

   book         1

How to eliminate articles,prepositions and pronous from above data set with out the word.

using base sas

plz help me.

5 REPLIES 5
RobertWoodruff
SAS Employee

You want to scan this dataset and remove records where the
variable “word” is an articles, prepositions or pronoun?

ballardw
Super User

If the list isn't real long this is a simple way for your example data:

data have;
input word $    count;
datalines;
lakshmi     10
ads            6
market       5
the            4
to              2
laks           2
is              2
what         2
and           1
help          1
book         1
;
run;

data want;
   set have;
   if upcase(word) in ('THE','A','AN','I','HE','SHE','WE','IT','THEM','TO','AND') then delete;
run;

I use upcase because data could have "The" and "the". Add words as desired. If the list gets real long then creating a data set of unwanted words and a Proc SQL approach might be better.

Reeza
Super User

Look up Natural Language Processing - NLP and see if you can find a list of words that would be considered articles, prepositions, nouns. Then its a simple SQL query to remove them.

Here's a list of 'stop words'

http://jmlr.org/papers/volume5/lewis04a/a11-smart-stop-list/english.stop

tlnarayana26
Calcite | Level 5

Thanks  for all your Great help.

Hi reeza sir  above link gives list words. how can move to next step. plz help.

Reeza
Super User

Well, for starters it's not sir.

Read in the list from the link <- RemoveList.

Create your word list <- WordList

Remove words from WordList via SQL step:

proc sql;

create table WordList2 as

select *

from WordList

where word NOT IN (select word from RemoveList)

order by word;

quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3007 views
  • 0 likes
  • 4 in conversation