BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
edouglasa
Fluorite | Level 6

Hi,

I have never used any of the SAS data mining tools. I have a simple problem. I have 500 legal texts. I want to search to see if they have ten specific words and produce output showing which laws have which texts. Can SAS do this? Can anyone point me to the right platform or code?

Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
edouglasa
Fluorite | Level 6

This is amazing. Thanks a ton!

 

View solution in original post

9 REPLIES 9
Satish_Parida
Lapis Lazuli | Level 10
/*Covert the files to long format string*/

data have;
infile cards dlm=',';
input text:$2000.;
cards;
we the people of UK
democracy is by the people
I love egg
;
run;

data want;
set have;
word_flag1=findw(text, 'UK',' ');
word_flag2=findw(text, 'people',' ');
/*.................
word_flag3=findw(text, '****',' ');
word_flag4=findw(text, '****',' ');
..................*/
run;
edouglasa
Fluorite | Level 6

Elegant - thanks!

edouglasa
Fluorite | Level 6

But only one question. Could I read the files from a directly rather than pasting them in?

andreas_lds
Jade | Level 19

@edouglasa wrote:

But only one question. Could I read the files from a directly rather than pasting them in?


To be sure that i understood your question: Do you want to read the words you search from file?

 

edouglasa
Fluorite | Level 6

I'd like to search for terms across word documents or txt files in a directory. I'd like output that indicates everytime which of the search terms is in which word or txt file.

 

Thanks so much!

andreas_lds
Jade | Level 19

@edouglasa wrote:

I'd like to search for terms across word documents or txt files in a directory. I'd like output that indicates everytime which of the search terms is in which word or txt file.

 

Thanks so much!


I don't think that reading word documents is possible with sas, at least not without wasting an incredible amout of time.

Can your post an example of the expected result-dataset?

 

 

edouglasa
Fluorite | Level 6

Got it. Maybe I need to find a more robust datamining software package.

 

I was looking for an output like:

 

Text_name, word1, word2, word3

text1, yes, yes, no

text2, yes, no, no

text3, yes, no, no

 

So the first three texts would all have word1, only text1 would have word 1 and none of the texts would have word3. I was hoping for a way to access the text files from a folder on my machine. 

 

 

andreas_lds
Jade | Level 19

Accessing all text-files in folder is easy:

 

data work.Matches;
    length 
        filename $ 200 
        text_name $ 1000
        word1-word3 $ 3
    ;

    infile "PATH/*.txt" filename=_filename;
    input;

    text_name = _infile_;
    word1 = ifc(findw(text_name, 'WORD_A'), 'yes', 'no');
    word2 = ifc(findw(text_name, 'WORD_BEE'), 'yes', 'no');
    word3 = ifc(findw(text_name, 'WORD_C'), 'yes', 'no');

    if index(cats(of word:), 'yes') then do;
        filename = _filename;
        output;
    end;
run;

This code is untested.  It should create a dataset containing all lines in which at least one word was found.

edouglasa
Fluorite | Level 6

This is amazing. Thanks a ton!

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1242 views
  • 0 likes
  • 3 in conversation