Hi,
This is probably a question that's been discussed many times, but I will appreciate if anyone out there can kindly shed light for me.
I have a dataset (primary) contains following columns (5000 rows)
ID Proc1 Proc2 Proc3 Proc4 (........) Proc55
1 288-3 199-2 211-3 133-2 254-8 (the value for Proc1-Proc55 are string characters)
2 274-2 292-2 213-2 493-2 280-2
.
.
.
and another dataset (lookup) contains following (70 rows)
Proc
288-3
292-2
381-2
.
.
.
The aim is to create a new variable in the "primary_new" dataset: flag. If any one of the rows in the "primary" file contain a matching data as the one in the "lookup", flag=1, else flag=0.
e.g.
ID Proc1 Proc2 Proc3 Proc4 (........) Proc55 flag
1 288-3 199-2 211-3 133-2 254-8 1
2 274-2 292-2 213-2 493-2 280-2 1
3 777-2 777-1 . . . 0
I was looking at the array approach, but I am not sure whether I am in the right direction? an example code will be much appreciated.
The first thing I'd want to do is transpose that data - it will make it much easier to work with especially when you want to do further work on it. Once you've done that a simple merge will work nicely i.e.
data have;
length id 8 proc1 $5 proc2 $5 proc3 $5 proc4 $5;
infile datalines;
input id proc1 proc2 proc3 proc4;
datalines;
1 288-3 199-2 211-3 133-2
2 274-2 292-2 213-2 493-2
;
run;
data lookup;
length proc $5;
infile datalines;
input Proc;
datalines;
288-3
292-2
381-2
;
run;
proc transpose data=have out=transposed(rename=(col1=proc _name_=procno));
by id;
var proc1-proc4;
run;
proc sort data=transposed;
by proc;
run;
proc sort data=lookup;
by proc;
run;
data want;
merge lookup(in=a) transposed(in=b);
by proc;
if a and b;
run;
data yo;
infile datalines dsd;
length ID 8.;
length PR1 PR2 PR3 $6.;
input ID PR1 PR2 PR3;
datalines;
1,AA,JJ,SS
2,BB,KK,TT
3,CC,LL,UU
4,DD,MM,VV
5,EE,NN,WW
6,FF,AA,XX
7,GG,PP,YY
8,HH,QQ,ZZ
9,II,RR,AA
run;
DATA YO2;
SET YO;
ARRAY PR[9] PR1-PR9;
DO I=1 TO 9;
IF PR[I]='AA' THEN FLG='1';
END;
RUN;
well .. you can change the array size to three, run the do loop to 3 instead of 9 and drop the iterator 'i' from the data step and should be all good.
The first thing I'd want to do is transpose that data - it will make it much easier to work with especially when you want to do further work on it. Once you've done that a simple merge will work nicely i.e.
data have;
length id 8 proc1 $5 proc2 $5 proc3 $5 proc4 $5;
infile datalines;
input id proc1 proc2 proc3 proc4;
datalines;
1 288-3 199-2 211-3 133-2
2 274-2 292-2 213-2 493-2
;
run;
data lookup;
length proc $5;
infile datalines;
input Proc;
datalines;
288-3
292-2
381-2
;
run;
proc transpose data=have out=transposed(rename=(col1=proc _name_=procno));
by id;
var proc1-proc4;
run;
proc sort data=transposed;
by proc;
run;
proc sort data=lookup;
by proc;
run;
data want;
merge lookup(in=a) transposed(in=b);
by proc;
if a and b;
run;
A format will do the work as well 🙂
data have;
length id 8 proc1 $5 proc2 $5 proc3 $5 proc4 $5;
input id proc1 proc2 proc3 proc4;
datalines;
1 288-3 199-2 211-3 133-2
2 274-2 292-2 213-2 493-2
;
run;
data lookup;
length proc $5;
infile datalines;
input Proc;
datalines;
288-3
292-2
381-2
;
run;
data lu;
set lookup(rename=(proc=start));
length fmtname type $32 label 8;
fmtname = 'lu';
type = 'C';
label = 1;
if _n_ = 1 then do;
output;
hlo = 'O';
label = 0;
output;
end;
else output;
run;
proc format cntlin=lu;
run;
data want;
set have;
array procs[4] proc1 - proc4;
length flag 8;
flag = 0;
do i = 1 to 4;
flag = ifn(flag=0,put(procs(i),lu.),flag);
end;
run;
//Fredrik
hi FredrikE,
your code works perfectly fine (and much more efficient). I tried and it works. It just that I am a new user of SAS, and I don't quite understand how the code starting from ---data lu--- work.
Pardon me for the naivety as proc format is a very unfamiliar procedure for me.
Whereas the other method suggested by Chris, although it involves a bit more coding, I can see the logic behind it.
having said that, your approach is brilliant, and I would have accepted your code as solution also if the forum rules allow.
also, thank you everyone else for your contribution.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.