Hi,
This is probably a question that's been discussed many times, but I will appreciate if anyone out there can kindly shed light for me.
I have a dataset (primary) contains following columns (5000 rows)
ID Proc1 Proc2 Proc3 Proc4 (........) Proc55
1 288-3 199-2 211-3 133-2 254-8 (the value for Proc1-Proc55 are string characters)
2 274-2 292-2 213-2 493-2 280-2
.
.
.
and another dataset (lookup) contains following (70 rows)
Proc
288-3
292-2
381-2
.
.
.
The aim is to create a new variable in the "primary_new" dataset: flag. If any one of the rows in the "primary" file contain a matching data as the one in the "lookup", flag=1, else flag=0.
e.g.
ID Proc1 Proc2 Proc3 Proc4 (........) Proc55 flag
1 288-3 199-2 211-3 133-2 254-8 1
2 274-2 292-2 213-2 493-2 280-2 1
3 777-2 777-1 . . . 0
I was looking at the array approach, but I am not sure whether I am in the right direction? an example code will be much appreciated.
The first thing I'd want to do is transpose that data - it will make it much easier to work with especially when you want to do further work on it. Once you've done that a simple merge will work nicely i.e.
data have;
length id 8 proc1 $5 proc2 $5 proc3 $5 proc4 $5;
infile datalines;
input id proc1 proc2 proc3 proc4;
datalines;
1 288-3 199-2 211-3 133-2
2 274-2 292-2 213-2 493-2
;
run;
data lookup;
length proc $5;
infile datalines;
input Proc;
datalines;
288-3
292-2
381-2
;
run;
proc transpose data=have out=transposed(rename=(col1=proc _name_=procno));
by id;
var proc1-proc4;
run;
proc sort data=transposed;
by proc;
run;
proc sort data=lookup;
by proc;
run;
data want;
merge lookup(in=a) transposed(in=b);
by proc;
if a and b;
run;
data yo;
infile datalines dsd;
length ID 8.;
length PR1 PR2 PR3 $6.;
input ID PR1 PR2 PR3;
datalines;
1,AA,JJ,SS
2,BB,KK,TT
3,CC,LL,UU
4,DD,MM,VV
5,EE,NN,WW
6,FF,AA,XX
7,GG,PP,YY
8,HH,QQ,ZZ
9,II,RR,AA
run;
DATA YO2;
SET YO;
ARRAY PR[9] PR1-PR9;
DO I=1 TO 9;
IF PR[I]='AA' THEN FLG='1';
END;
RUN;
well .. you can change the array size to three, run the do loop to 3 instead of 9 and drop the iterator 'i' from the data step and should be all good.
The first thing I'd want to do is transpose that data - it will make it much easier to work with especially when you want to do further work on it. Once you've done that a simple merge will work nicely i.e.
data have;
length id 8 proc1 $5 proc2 $5 proc3 $5 proc4 $5;
infile datalines;
input id proc1 proc2 proc3 proc4;
datalines;
1 288-3 199-2 211-3 133-2
2 274-2 292-2 213-2 493-2
;
run;
data lookup;
length proc $5;
infile datalines;
input Proc;
datalines;
288-3
292-2
381-2
;
run;
proc transpose data=have out=transposed(rename=(col1=proc _name_=procno));
by id;
var proc1-proc4;
run;
proc sort data=transposed;
by proc;
run;
proc sort data=lookup;
by proc;
run;
data want;
merge lookup(in=a) transposed(in=b);
by proc;
if a and b;
run;
A format will do the work as well 🙂
data have;
length id 8 proc1 $5 proc2 $5 proc3 $5 proc4 $5;
input id proc1 proc2 proc3 proc4;
datalines;
1 288-3 199-2 211-3 133-2
2 274-2 292-2 213-2 493-2
;
run;
data lookup;
length proc $5;
infile datalines;
input Proc;
datalines;
288-3
292-2
381-2
;
run;
data lu;
set lookup(rename=(proc=start));
length fmtname type $32 label 8;
fmtname = 'lu';
type = 'C';
label = 1;
if _n_ = 1 then do;
output;
hlo = 'O';
label = 0;
output;
end;
else output;
run;
proc format cntlin=lu;
run;
data want;
set have;
array procs[4] proc1 - proc4;
length flag 8;
flag = 0;
do i = 1 to 4;
flag = ifn(flag=0,put(procs(i),lu.),flag);
end;
run;
//Fredrik
hi FredrikE,
your code works perfectly fine (and much more efficient). I tried and it works. It just that I am a new user of SAS, and I don't quite understand how the code starting from ---data lu--- work.
Pardon me for the naivety as proc format is a very unfamiliar procedure for me.
Whereas the other method suggested by Chris, although it involves a bit more coding, I can see the logic behind it.
having said that, your approach is brilliant, and I would have accepted your code as solution also if the forum rules allow.
also, thank you everyone else for your contribution.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.