I have this list of words and i need to check if those words exist in another variable, in the same dataset, which cointains a set of words separated by ','.
data dataset;
set dataset;
x = find(upcase(var1),upcase(var2));
run;
this works most of the time, but if i have to look for SUCESS its telling me that the word exist because it reads SUCCESSfully, so need te exact word.
So then i moved to FINDW
data dataset;
set dataset;
x = find(upcase(var1),upcase(var2));
run;
But for 2 words at the same time like: BEST BUY and the string is COCA COLA, BEST BUY it brings 0 meaning it couldnt find it.
Usually this is caused by including the trailing spaces in the text that you are searching for. SAS stores character strings as fixed length. So you need to remove the trailing spaces before passing the string to the FINDW() function. You can use the modifiers of the FINDW() command to make this easier. I like to use SPIT as the modifiers for this.
data test;
length str $200 word $50 ;
infile cards dsd dlm='|' truncover ;
input str word ;
found_any=index(upcase(str),upcase(strip(word)));
found_word=findw(str,word,,'spit');
put (_all_) (=) ;
cards;
Successfully|success
Coca Cola,Best Buy|BEST BUY
;
str=Successfully word=success found_any=1 found_word=0 str=Coca Cola,Best Buy word=BEST BUY found_any=11 found_word=11
data w;
k='SUCCESSfully';
if findw(k,'SUCCESS')>0 then put 'found';
else put'not found';
run;
Thats ok but it need one variable to be searched in the other one
key = find(upcase(var1),upcase(var2));
so the issue is that composed words like SUCESSFULLY will be count as 1 when searching for SUCESS
and using FINDW would not work all the time.
Can you please post a sample of few records ?
VAR1 VAR2
COCA COLA PEPSI, COCA COLA
LANIX LANIX, SONY
KEY = FIND(VAR1,VAR2);
VAR1 VAR2 KEY
COCA COLA PEPSI, COCA COLA 8
LANIX LANIX, SONY 1
SUCESS SUCESSFULLY 1
KEY = FINDW(VAR1,VAR2);
VAR1 VAR2 KEY
COCA COLA PEPSI, COCA COLA O
LANIX LANIX, SONY 1
SUCESS SUCESSFULLY 0
data have;
input VAR1 & $10. VAR2 & $20.;
cards;
COCA COLA PEPSI, COCA COLA 8
LANIX LANIX, SONY 1
SUCESS SUCESSFULLY 1
PEPSI PEPSI
;
data want;
set have;
do _n_=1 to countw(var2,',');
k=findw(var1,strip(scan(var2,_n_,',')));
if k>0 then leave;
end;
run;
Do you have control over the format of VAR2? The easiest way would be to add commas, and take out spaces around the commas. For example:
VAR2=",COCA COLA,BEST BUY,";
Second easiest would be if you could just eliminate the spaces around the commas:
VAR2="COCA COLA,BEST BUY";
In the top version, you could use:
found = index(var2, "," || strip(VAR1) || ",");
You might have to adjust the value of FOUND by 1, due to the presence of the extra commas.
For the bottom version, you would end up with a more complex expression:
found = index("," || strip(var2) || "," , "," || strip(VAR1) || "," ) ;
Again, you might have to adjust the result by 1.
I HOPE YOU FIND THIS MORE UNDERSTANDABLE
* CREATE MAIN TABLE *;
DATA TABLE1;
INFORMAT CAT $15. VALUE $500.;
INFILE DATALINES DLM = '09X';
INPUT CAT $ VALUE $;
DATALINES;
PAYMENT09XSUNOCO PAY, BEST BUY PAY, SUCESSFUL
;
RUN;
*UNIFY FORMATS*;
DATA TABLE1;
RETAIN SOURCE CAT VALUE;
SET TABLE1;
SOURCE ='EMAIL';
VALUE=COMPRESS(VALUE,'09X'); * ELIMINATE TABS *;
VALUE=TRANWRD(VALUE," ,",","); * REMOVE SPACE BETWEEN WORD AND COMMA *;
VALUE=TRANWRD(VALUE,", ",","); * REMOVE SPACE BETWEEN COMMA AND WORD *;
VALUE=TRANWRD(VALUE,",",", "); * PUT A SPACE BETWEEN COMMA AND WORD *;
USER=UPCASE("&SYSUSERID."); * RECORD USERNAME *;
TIMESTAMP = DATETIME();
FORMAT TIMESTAMP DATETIME16.;
RUN;
* TABLE 2 IS IMPORTED BUT FOR TIME REASONS I'LL JUST CREATE IT *;
DATA TABLE2;
INFORMAT WORD $15.;
INFILE DATALINES DSD;
DATALINES;
INPUT WORD $;
SUCESS
BEST BUY PAY
;
RUN;
* PREPARE TABLE 2 *;
PROC SORT DATA = TABLE2 NODUPKEY OUT=T2; BY WORD; RUN;
DATA T2; RETAIN CAT; SET T2; CAT = "PAYMENT"; RUN;
* CREATE TABLE 3 THATS WHERE THE ISSUE HAPPENS *;
DATA TABLE3;
LENGTH CAT $15 WORD $50. VALUE $500.;
IF _N_ = 1 THEN DO;
DECLARE HASH 3 (DATASET:'TABLE1');
E.DEFINEKEY('CAT');
E.DEFINEDATA('CAT','VALUE');
E.DEFINEDONE();
CALL MISSING(CAT,VALUE);
END;
SET T2;
DROP RC;
RC = E.FIND();
FINDW = FINDW(VALUE,WORD,'I');
FIND = FIND(TRIM(VALUE),TRIM(UPCASE(WORD)));
RUN;
Usually this is caused by including the trailing spaces in the text that you are searching for. SAS stores character strings as fixed length. So you need to remove the trailing spaces before passing the string to the FINDW() function. You can use the modifiers of the FINDW() command to make this easier. I like to use SPIT as the modifiers for this.
data test;
length str $200 word $50 ;
infile cards dsd dlm='|' truncover ;
input str word ;
found_any=index(upcase(str),upcase(strip(word)));
found_word=findw(str,word,,'spit');
put (_all_) (=) ;
cards;
Successfully|success
Coca Cola,Best Buy|BEST BUY
;
str=Successfully word=success found_any=1 found_word=0 str=Coca Cola,Best Buy word=BEST BUY found_any=11 found_word=11
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.