HI SAS friends,
Am using SAS ver 9.4.
Need your help and expertise setting up text searching and selection.
There are two sample datasets below.
The dataset COLORS, has the colors terms that need to be searched for.
In the dataset HAVE, there are two columns (apples , heroes). Need to identify those rows in HAVE where
one of the colors is mentioned at least once. Need to search the two columns, Apples and Heroes,
for any instance of one of the colors.
Some rows will be completely excluded. Some rows may have a color of interest mentioned more than once.
I'd guess this involves Do Loops, probably one per column of interest (Apples, Heros), but beyond am not sure what to do next.
Thanks for your suggestions, much appreciated.
R
data COLORS;
input color $10. ;
datalines ;
red
purple
green
blue
;
run;
data HAVE ;
input Apples $14. Heroes $20. ;
datalines ;
red delicious green hornet
sweet fuscia assain orange
gala red lavender lantern
green granny spidy red
sour grey assain purple
:
run ;
something likes this. this will find whether color is there in apples or heroes
proc sql;
select distinct a.* from
have a
inner join
colors b
on index(trim(apples),trim(color)) gt 0
or index(trim(heroes),trim(color)) gt 0;
something likes this. this will find whether color is there in apples or heroes
proc sql;
select distinct a.* from
have a
inner join
colors b
on index(trim(apples),trim(color)) gt 0
or index(trim(heroes),trim(color)) gt 0;
How about if you merge the datasets together so that each colour is attached to the fruit (a many to many merge).
You could then use the index function to identify the colours present in the fruit name as everything would be present in the same dataset. There are no doubt more complicated methods, but this would be worth a go first.
Hi,
Regarding merging the two datasets, via a many to many merge, was trying to avoid that bc the actual set of search terms has about ~12000 entries (rows), and there are actually 2 columns of different "colors" (strings) to search for, not one.
Also, in the dataset to be searched, there are about 27000 rows and also two columns of interest.
So there's much more data than is provided in the sample datasets.
That said, it its' probably possible, and I like the simplicity.
Thank you,
R
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.