Hi SAS Friends,
Need some help with a Proc SQL merge, believe this is a "many to many" merge.
Attached is the sample code and desired output, have also pasted it below:
data COLORS;
input color $10. ;
datalines ;
red
purple
green
blue
;
run;
data Storys ;
input ID $2. Fruit_Storys $45. ;
datalines ;
1 red delicious apples are the prode of NY
2 the sweet fuscia parsimmons are too ripe
3 yellow and blue gala apples are superb
4 granny apples are green and lean
5 sour grey raisins are never chosen
:
run ;
/*...DESIRED OUTPUT...*/
ID Fruit_Storys color
1 red delicious apples are the prode of NY red
3 yellow and blue gala apples are superb blue
4 granny apples are green and lean green
What needs to happen is that in the STORYS data set, for the variable "Fruit_Storys" , the listed substrings from the COLORS data set variable :color" are searched for. If not present the row is ignored. If one of the color variables is present the row is retained, and also in the output the identified "colors" variable is listed in the column "color" . Also the ID variable from the "Fruit_Storys" data set is carried along.
I suppose this is a walk in the park for an experienced PROC SQL maven, and would greatly appreciate an answer with functional code.
Thanks as always, much appreciated
Robert
What spoils a walk in the park is an incomplete problem specification . In the case where more than one color appears in the string (ID=6), I cloose to report the last occurence:
data COLORS;
input color $10. ;
datalines;
red
purple
green
blue
;
data Storys ;
input ID $2. Fruit_Storys $64.;
datalines ;
1 red delicious apples are the prode of NY
2 the sweet fuscia parsimmons are too ripe
3 yellow and blue gala apples are superb
4 granny apples are green and lean
5 sour grey raisins are never chosen
6 sour green or blue raisins are sometimes chosen
;
proc sql;
create table matches as
select
ID,
fruit_storys,
color,
findw(fruit_storys, trim(color)) as pos
from
storys, colors
where calculated pos > 0
group by ID, fruit_storys
having calculated pos = max(calculated pos);
quit;
What spoils a walk in the park is an incomplete problem specification . In the case where more than one color appears in the string (ID=6), I cloose to report the last occurence:
data COLORS;
input color $10. ;
datalines;
red
purple
green
blue
;
data Storys ;
input ID $2. Fruit_Storys $64.;
datalines ;
1 red delicious apples are the prode of NY
2 the sweet fuscia parsimmons are too ripe
3 yellow and blue gala apples are superb
4 granny apples are green and lean
5 sour grey raisins are never chosen
6 sour green or blue raisins are sometimes chosen
;
proc sql;
create table matches as
select
ID,
fruit_storys,
color,
findw(fruit_storys, trim(color)) as pos
from
storys, colors
where calculated pos > 0
group by ID, fruit_storys
having calculated pos = max(calculated pos);
quit;
PGS,
Thanks very much, specifications are made to be broken, especially my poorly written ones (!),
thanks for providing an understandable solution.
R
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.