Hi SAS Pro,
I have a dataset (Have1) including a series of relevant variables v1, v2, v3, and v4, with each variable represents codes (character variable). I also have a separate dataset (Have2) containing a certain list of codes. I would like to create a new variable called flag. Flag will have a value of 1 if v1-v4 in Have1 contains any codes listed in Have2. It would be efficient to employ array since there are a series relevant variables v1 to v4. Also, because my really data for Have2 has thousands codes, it is impossible to type every code in the SAS syntax unless to use a hash table. However, the tricky part is how to incorporate hash table into array function?
I would really appreciate any help!
Best regards,
C
data have1;
length v1-v4 $7.;
input v1 $ v2 $ v3 $ v4 $;
datalines;
a1 b2 c234 d987
e45 h76 k93 a1
d9087 b4 m7908 v9090
s6987 k2983 o87 f338
b2 f338 g6 x3
Y3 h76 009 m1a
m1a a1 2983 f02
2983 k8 y4 w348
k93 e5 w77 h5
h76 . b2 .
. . k93 .
;
run;
data have2;
length v $7.;
input v $;
datalines;
b2
h76
k93
v9090
2983
;
run;
Below two options how you could do this.
data have1;
length v1-v4 $7.;
input v1 $ v2 $ v3 $ v4 $;
datalines;
a1 b2 c234 d987
e45 h76 k93 a1
d9087 b4 m7908 v9090
s6987 k2983 o87 f338
b2 f338 g6 x3
Y3 h76 009 m1a
m1a a1 2983 f02
2983 k8 y4 w348
k93 e5 w77 h5
h76 . b2 .
. . k93 .
;
data have2;
length v $7.;
input v $;
datalines;
b2
h76
k93
v9090
2983
;
/* option 1: using a hash table */
data want1(drop=_i v);
if _n_=1 then
do;
if 0 then set have2(keep=v);
dcl hash h1(dataset:'have2');
h1.defineKey('v');
h1.defineDone();
end;
set have1;
array av {*} v1-v4;
flag='0';
do _i=1 to dim(av);
if h1.check(key:av[_i])=0 then
do;
flag='1';
leave;
end;
end;
run;
/* option 2: using a format */
data ctrl;
retain fmtname '$v_flag' type 'c' label '1';
set have2(rename=(v=start)) end=last;
output;
if last then
do;
hlo='O';
label='0';
output;
end;
run;
proc format cntlin=ctrl;
run;
data want2(drop=_i);
set have1;
array av {*} v1-v4;
flag='0';
do _i=1 to dim(av);
flag=put(av[_i],$v_flag1.);
if flag='1' then leave;
end;
run;
I don't think the array helps with the codes you are trying to find. But it will help with searching multiple variables to see if any of them contain one of the target codes.
So you could preload the codes into a hash object and the for each observation loop over the array of variables and check if that value is in the hash object. You can stop once you find at least one array element that matches.
data want ;
if _n_=1 then do;
set have2(keep=v);
declare hash h(dataset:'have2(keep=v)');
rc=h.definekey('v');
rc=h.definedone();
end;
set have1 ;
array vars v1-v4 ;
do index=1 to dim(vars) until(result);
result=not h.find(key:vars[index]);
end;
drop rc index v;
run;
Results:
Obs v1 v2 v3 v4 result 1 a1 b2 c234 d987 1 2 e45 h76 k93 a1 1 3 d9087 b4 m7908 v9090 1 4 s6987 k2983 o87 f338 0 5 b2 f338 g6 x3 1 6 Y3 h76 009 m1a 1 7 m1a a1 2983 f02 1 8 2983 k8 y4 w348 1 9 k93 e5 w77 h5 1 10 h76 b2 1 11 k93 1
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.