In the cars dataset from SASHELP, I wand to find any MODEL observation that contains "Cent" or "Quatt". I tried different find options and FIND looks to work to find one of the strings I am searching for. How may I search for multiple strings using one FIND function?
I think of a practical example when I may want to find articles based on keywords. For example, all articles that have "gene" "genetic" "genetically" "allele" "genes" "inherited" "history" in their title where the title is a character variable in a SAS dataset.
Thank you for your help!
data cars;
set sashelp.cars;
run;
data cars_model;
set cars;
Model_up=upcase(Model);
model_var_f = find (Model_up, "CENT");
model_var_fw = findw (Model_up, "CENT");
model_var_i = index (Model_up, "CENT");
model_var_ic = indexc (Model_up, "CENT");
model_var_iw = indexw (Model_up, "CENT");
run;
proc freq data=cars_model; tables model_up; run;
proc freq data=cars_model; tables model_var_f model_var_fw model_var_i model_var_ic model_var_iw; run;
@Emma_at_SAS wrote:
In the cars dataset from SASHELP, I wand to find any MODEL observation that contains "Cent" or "Quatt".
How about this:
model_var_f = find (Model, "cent", 'i') or find(model,'quatt','i');
If you have lots of strings to match, please see the method explained at https://communities.sas.com/t5/SAS-Programming/Check-if-a-list-of-substrings-is-in-a-string/td-p/766...
data cars;
set sashelp.cars;
if prxmatch('/Cent|Quatt/i',model);
run;
Thank you @PaigeMiller and @Ksharp . Both your methods work. I have a follow-up question. How do I manage space before the stings? I know of STRIP command but I do not know how to use it in FIND or PRXMATCH. I appreciate your suggestions. Thanks
Using the FIND command, you don't have to worry about leading blanks. If anywhere in the text string "Cent" or "Quatt" is found, the leading blanks don't interfere.
So, using the FIND command I showed, the model_var_f has value 1 for those models with "Cent" or "Quatt" in the variable name.
A potential problem is that this also finds the Hyundai Accent models, and maybe you don't want that? Again, easily fixed if you want to exclude Hyundai Accent models.
Thank you @PaigeMiller. Your response was very helpful.
@Emma_at_SAS Below a coding option which would allow you to only change an informat to change the set of terms to look for.
data cars;
set sashelp.cars;
run;
proc format;
invalue myterms_find
'/\b(cent|quatt)/i' (regexp) = 1
other=0
;
invalue $myterms_get
's/^.*?(\b(cent|quatt).*?\b).*$/$1/i' (regexpe) = _same_
other= ' '
;
invalue $mycategory_get
's/^.*?\b(cent|quatt).*$/$1/i' (regexpe) = _same_
other= ' '
;
run;
data want;
length category $10 term $20;
set cars;
if input(strip(model),myterms_find.)=1;
/* first term selected if multiple matching terms */
category=input(strip(model),$mycategory_get.);
term =input(strip(model),$myterms_get.);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.