DATA Step, Macro, Functions and more

How to find strings from a table within strings from another table?

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

How to find strings from a table within strings from another table?

[ Edited ]

Hi everyone,
here is my issue:

I have two available tables:
-a table list_words with one string variable named words) and 10 individuals (or p individuals in general, with p supposed as known)
-a table raw_data with one string variable (maned text) and 20 individuals (or n individuals in general, with n supposed as known)

I would like to create a new variable main_word within the raw_data table.
For each individual i of the raw_data table, if one of the words (i.e. a value of the variable words of the raw_data table) can be found within the text,
then the i-th element of raw_data takes the found word as a value for main_word.
If no word is found within the text for the i-th individual, then main_word receives the value "" for that individual.
is it possible to do such a thing without having to write a DATA step on the raw_data table for each of the words?
(with the word manually entered as a parameter for an indexw function)?


What I tried to do (and which didn't work):
1: Use an array and put each word in a macro-variable.

DATA _NULL_ ;
 SET list_words;
 CALL SYMPUT(COMPRESS("wordsmacro"||_N_),words) ;
RUN ;

%LET array_word = array array_word $30 wordsmacro&1-wordsmacro&10 ;

DATA raw_data ;
 SET raw_data ;
 LENGTH main_word $ 30 ;
 DO i = 1 to 10 ;
  IF INDEXW(text, array_word[i]) > 0 
  THEN DO ;
   main_word = array_word[i] ;
   LEAVE ;
       END ;
 END ;
RUN ;

 

 

The issue here is that array_word apparently isn't defined inside the DATA step.
The log extract with the mistakes is the following:
27           IF INDEXW(text, array_word[i]) > 0
ERROR: Undeclared array referenced: array_word.
ERROR: Variable array_word has not been declared as an array.
28           THEN DO ;
29            main_word = array_word[i] ;
ERROR: Undeclared array referenced: array_word.
ERROR: Variable array_word has not been declared as an array.
30            LEAVE ;

 

2: Put each word in a macro-variable and do a loop on the table:

DATA _NULL_ ;
 SET list_words ;
 CALL SYMPUT(COMPRESS("wordsmacro"||_N_),words) ;
RUN ;

%MACRO parcours ;
DATA raw_data ;
 SET raw_data ;
 LENGTH main_word $ 30 ;
 %DO i = 1 %TO 10 :
  IF INDEXW(text, &&wordsmacro&i) > 0
  THEN DO ;
   main_word = &&wordsmacro&i ;
   LEAVE ;
       END ;
 %END ;
RUN ;
%MEND ;
%parcours ;

 

 

 

The log extract with the mistakes is the following:
23         %parcours ;
WARNING: Apparent symbolic reference I not resolved.
WARNING: Apparent symbolic reference MOTSMACRO not resolved.
WARNING: Apparent symbolic reference I not resolved.
ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was:
       10 :   IF INDEXW(text, &&wordsmacro&i) > 0   THEN DO
ERROR: The %TO value of the %DO I loop is invalid.
ERROR: The macro PARCOURS will stop executing.
24        
25         GOPTIONS NOACCESSIBLE;
26         %LET _CLIENTTASKLABEL=;
27         %LET _CLIENTPROJECTPATH=;
28         %LET _CLIENTPROJECTNAME=;
29         %LET _SASPROGRAMFILE=;
30        
31         ;*';*";*/;quit;run;
                     ____
                     180

ERROR 180-322: Statement is not valid or it is used out of proper order.


3: I don't know how to code it here, but maybe a hash object could work too?


Some possible tables:

DATA list_words ;
 INPUT mots $ 1-40 ;
 DATALINES ;
  Lorem
  ipsum
  dolor
  sit
  amet
  consectetur
  adipiscing
  elit
  sed
  do
RUN ;

DATA raw_data ;
 INPUT texte $ 1-600 ;
 DATALINES ;
  Lorem ipsum dolor sit amet
  consectetur adipiscing elit, sed do
  eiusmod tempor incididunt ut labore
  et dolore magna aliqua.
  Lorem ipsum dolor sit amet
  consectetur adipiscing elit, sed do
  eiusmod tempor incididunt ut labore
  et dolore magna aliqua.
  Lorem ipsum dolor sit amet
  consectetur adipiscing elit, sed do
  eiusmod tempor incididunt ut labore
  et dolore magna aliqua.
  Lorem ipsum dolor sit amet
  consectetur adipiscing elit, sed do
  eiusmod tempor incididunt ut labore
  et dolore magna aliqua.
  Lorem ipsum dolor sit amet
  consectetur adipiscing elit, sed do
  eiusmod tempor incididunt ut labore
  et dolore magna aliqua.
RUN ;

 


Accepted Solutions
Solution
‎06-19-2017 03:44 AM
Super User
Posts: 11,343

Re: How to find strings from a table within strings from another table?

[ Edited ]
Posted in reply to saslamber

First off: Array_word created with the %let statement creates a macro variable that would be referenced when used with &array_word.

The Array_word in the data step is treated as an array because you use the [i] construct and you did not define an array named Array_word. You cannot assign an array multiple times which your attempt would do.

%LET array_word = array array_word $30 wordsmacro&1-wordsmacro&10 ;

DATA raw_data ;
 SET raw_data ;
 LENGTH main_word $ 30 ;
 DO i = 1 to 10 ;
  IF INDEXW(text, array_word[i]) > 0 

 

Before going to the data step examine the result of

%put &array_word.

You will not be using the text of the values of the variable "words" in data set test. You also are using macro variables &1 and &10. Which are not valid macro variable names. Macro variable names have the same naming requirement as data set variables.

 

Did you get this to work in any form before attempting to use a macro?

Here's one approach using macros and arrays if your idea is to find the first word in each line of text.

data test;
   input words $;
datalines;
first
second 
third
;
run;
data raw_data;
   infile datalines missover;
   input text $ 1-50 ;
datalines;
The sentence contains the word third
first is on the second line
;
run;
proc sql noprint;
   select quote(strip(words)) into : wordlist separated by ','
   from test ;
quit;
%put &wordlist;
%let wordcount=&sqlobs;

DATA new_data ;
 SET raw_data ;
 LENGTH main_word $ 30 ;
 array array_word (&wordcount) $ 30 _temporary_(&wordlist);
 DO i = 1 to 10 ;
  IF INDEXW(text,array_word[i] ) > 0 
  THEN DO ;
   main_word = array_word[i] ;
   LEAVE ;
       END ;
 END ;
RUN ;

View solution in original post


All Replies
Solution
‎06-19-2017 03:44 AM
Super User
Posts: 11,343

Re: How to find strings from a table within strings from another table?

[ Edited ]
Posted in reply to saslamber

First off: Array_word created with the %let statement creates a macro variable that would be referenced when used with &array_word.

The Array_word in the data step is treated as an array because you use the [i] construct and you did not define an array named Array_word. You cannot assign an array multiple times which your attempt would do.

%LET array_word = array array_word $30 wordsmacro&1-wordsmacro&10 ;

DATA raw_data ;
 SET raw_data ;
 LENGTH main_word $ 30 ;
 DO i = 1 to 10 ;
  IF INDEXW(text, array_word[i]) > 0 

 

Before going to the data step examine the result of

%put &array_word.

You will not be using the text of the values of the variable "words" in data set test. You also are using macro variables &1 and &10. Which are not valid macro variable names. Macro variable names have the same naming requirement as data set variables.

 

Did you get this to work in any form before attempting to use a macro?

Here's one approach using macros and arrays if your idea is to find the first word in each line of text.

data test;
   input words $;
datalines;
first
second 
third
;
run;
data raw_data;
   infile datalines missover;
   input text $ 1-50 ;
datalines;
The sentence contains the word third
first is on the second line
;
run;
proc sql noprint;
   select quote(strip(words)) into : wordlist separated by ','
   from test ;
quit;
%put &wordlist;
%let wordcount=&sqlobs;

DATA new_data ;
 SET raw_data ;
 LENGTH main_word $ 30 ;
 array array_word (&wordcount) $ 30 _temporary_(&wordlist);
 DO i = 1 to 10 ;
  IF INDEXW(text,array_word[i] ) > 0 
  THEN DO ;
   main_word = array_word[i] ;
   LEAVE ;
       END ;
 END ;
RUN ;
PROC Star
Posts: 7,474

Re: How to find strings from a table within strings from another table?

Posted in reply to saslamber

Not sure what you are trying to do, but I presume that you want your word search to be case insensitive. If so, the following example should provide one way of approaching the task:

data list_words;
  informat words $11.;
  input words;
  cards;
Lorem
ipsum
dolor
sit
amet
consectetur
adipiscing
elit
sed
do
;
data raw_data;
  informat text $50.;
  input text &;
  cards;
Lorem ipsum dolor sit amet
consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore
et dolore magna aliqua.
Lorem ipsum dolor sit amet
consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore
et dolore magna aliqua.
Lorem ipsum dolor sit amet
consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore
et dolore magna aliqua.
Lorem ipsum dolor sit amet
consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore
et dolore magna aliqua.
Lorem ipsum dolor sit amet
consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore
et dolore magna aliqua.
;

DATA want (keep=text found);
  array array_word(999) $ _temporary_;
  length found $5;
  do until(eof1);
    set list_words end=eof1;
    n+1;
    array_word[n]=words;
  end;
  do until(eof2);
    SET raw_data end=eof2;
    call missing(found);
    DO i=1 to n;
      test=array_word[i];
      if findw(text,strip(array_word[i]),' ','i') then do;
        found='Yes';
        leave;
      end;
    end;
    IF missing(found) then found='no';
    output;
  end;
RUN ;

Art, CEO, AnalystFinder.com

New Contributor
Posts: 4

Re: How to find strings from a table within strings from another table?

Thanks to both, both of your programs worked and solved my issue, so the help was much appreciated.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 149 views
  • 0 likes
  • 3 in conversation