Hi @hashman ,
any luck with the free time/the code yet? 🙂
@LuciaCekanakova: Sorry, totally swamped now.
You can store VARIABLES in a lookup table and then, in principle, go two ways:
The first approach is simpler since you don't have to write code to parse STRING, and yet I still find it the other one better because it's likely to yield much better performance; so this is what is adopted in the code below.
data strings ;
input string $6. ;
cards ;
a x
aa l z
b k
b y m
c z
c x k
a b
;
run ;
data variables ;
input (var1-var3) (& :$3.) ;
cards ;
a k x
b l y
a b m z
. n .
;
run ;
data want (keep = string var:) ;
set strings ;
if _n_ = 1 then do ;
dcl hash h () ;
h.definekey ("_s") ;
h.definedata ("_i_") ;
h.definedone () ;
do until (z) ;
set variables end = z ;
array v var: ;
do over v ;
if cmiss (v) = 0 then h.add (key:v, data:_i_) ;
end ;
end ;
end ;
_s = string ;
call missing (_s, of var:) ;
do _i = 1 to countw (string) ;
_s = scan (string, _i) ;
if h.find (key: _s) = 0 then v = _s ;
do _j = _i + 1 to countw (string) ;
_s = catx (" ", _s, scan (string, _j)) ;
if h.find (key: _s) = 0 then v = _s ;
end ;
end ;
run ;
Note that, as @ballardw has mentioned, the fact that any of VAR1-VAR3 can contain more than one component (in your case, "a b" in VAR1) throws a kind of monkey wrench into the task: Without it, parsing STRING would be a simple scan, rather than the nested scan requiring the concatenation to account for all possible left-to-right substrings without losing any possible matches with VARIABLES. But specs are specs.
Kind regards
Paul D.
I am unsure what the actual data looks like, but you are likely to have multiple matches.
You may want to split the logic into two steps:
1. Match the data
2. Decide on the matches to keep.
This does the first step:
proc sql;
create table WANT as
select STRING
,ifc(findw(STRING,VAR1,' ','r'), VAR1,' ') as VAR1 length=8
,ifc(findw(STRING,VAR2,' ','r'), VAR2,' ') as VAR2 length=8
,ifc(findw(STRING,VAR3,' ','r'), VAR3,' ') as VAR3 length=8
from STRINGS,VARIABLES
having lengthn(VAR1) | lengthn(VAR2) | lengthn(VAR3)
order by STRING;
quit;
STRING | VAR1 | VAR2 | VAR3 |
a x | a | x | |
aa l z | l | ||
aa l z | z | ||
b k | k | ||
b k | b | ||
b y m | b | y | |
b y m | m | ||
c z | z | ||
c x k | k | x | |
a b | a | ||
a b | b | ||
a b | a b |
This could give you a start .
data strings ;
input string $6. ;
cards ;
a x
aa l z
b k
b y m
c z
c x k
a b
;
run ;
data variables ;
input (var1-var3) (& :$3.) ;
cards ;
a k x
b l y
a b m z
. n .
;
run ;
proc sql;
create table want as
select a.*,var1,var2,var3
from strings as a left join variables(keep=var1) as b
on findw(a.string,strip(b.var1),' ')
left join variables(keep=var2) as c
on findw(a.string,strip(c.var2),' ')
left join variables(keep=var3) as d
on findw(a.string,strip(d.var3),' ')
;
quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.