@hashman Hi, I discovered 2 more things that aren't working as I expected: When the VARIABLES table contains mulitple-word values, but the partial values are not in the value list (e.g. "South America" is there, but "South" by itself is not) they are also searched for in the STRINGS and if they are found, they are output in the new column (see the column regions in obs #2 and #8). I'd like to stop this from happening adn only search for the whole values if they have multiple-words. In VARIABLES table it happens that the same value is in multiple columns (e.g. beans, carrot... - I added a new column with the general type of vegetable). When this occurs, the found value is output only in one variable column it was found in, it doesn't repeat in others (see columns vegetable and vegetablegeneral in obs #1,2,8,9). I'd like for these values to repeat in all the output column were they should be base don the original VARIABLES columns. My output: Desired output: Please, do you have any tip on how to modify the code to achieve this? The code so far with new data: data strings ;
input string $30.;
cards ;
Ford carrot
Honda beans South Africa
south america black beans
green beans Audi
x5 chili pepper
garlic north America
central europe bmw
lettuce north onion X5
America pepper
X5 BMW Europe
Central East Europe
bmw X5 america
;
run ;
data variables;
infile datalines delimiter=',' dsd;
length cars $6 vegetablegeneral $7 vegetable $12 regions $14;
input cars $ vegetablegeneral $ vegetable $ regions $;
datalines;
Audi,beans,beans,America
BMW,carrot,black beans,Central Europe
BMW X5,lettuce,carrot,East Europe
X5,onion,chili pepper,Europe
Honda,pepper,green beans,North America
,,lettuce,South America
,,onion,
,,pepper,
;
run;
proc sql noprint ;
select max (length) into :maxvlen from dictionary.columns
where libname="WORK" and memname="VARIABLES"
;
quit ;
proc sql noprint;
select name into :varlist separated by ' ' from dictionary.columns
where libname="WORK" and memname="VARIABLES";
quit;
data want (drop = _:) ;
set strings ;
if _n_ = 1 then do ;
length _s _z $ &maxvlen ;
dcl hash h () ;
h.definekey ("_s") ;
h.definedata ("_z", "_i_") ;
h.definedone () ;
do until (z) ;
set variables end = z ;
array v &varlist ;
do over v ;
if cmiss (v) then continue ;
do _i = 1 to countw (v) ;
_z = scan (v, _i) ;
h.ref (key: upcase (_z), data: _z, data: _i_) ;
do _j = _i + 1 to countw (v) ;
_z = catx (" ", _z, scan (v, _j)) ;
h.ref (key: upcase (_z), data: _z, data: _i_) ;
end ;
end ;
end ;
end ;
end ;
call missing (of v[*]) ;
do _i = 1 to countw (string) ;
_s = scan (string, _i) ;
if h.find (key: upcase (_s)) = 0 then if length (_z) > length (v) then v = _z ;
do _j = _i + 1 to countw (string) ;
_s = catx (" ", _s, scan (string, _j)) ;
if h.find (key: upcase (_s)) = 0 then if length (_z) > length (v) then v = _z ;
end ;
end ;
run ;
... View more