BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
newusernew
Fluorite | Level 6

Hi, 

This is probably a question that's been discussed many times, but I will appreciate if anyone out there can kindly shed light for me. 

 

I have a dataset (primary) contains following columns (5000 rows)

ID Proc1 Proc2 Proc3 Proc4 (........) Proc55

1  288-3 199-2  211-3 133-2            254-8 (the value for Proc1-Proc55 are string characters)

2  274-2  292-2 213-2 493-2            280-2

.

.

.

 

and another dataset (lookup) contains following (70 rows)

Proc

288-3

292-2

381-2

.

.

.

 

The aim is to create a new variable in the "primary_new" dataset: flag. If any one of the rows in the "primary" file contain a matching data as the one in the "lookup", flag=1, else flag=0.

e.g. 

ID Proc1 Proc2 Proc3 Proc4 (........) Proc55  flag

1  288-3 199-2  211-3 133-2            254-8     1

2  274-2  292-2 213-2 493-2            280-2     1

3  777-2  777-1    .          .                    .          0

I was looking at the array approach, but I am not sure whether I am in the right direction? an example code will be much appreciated. 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisBrooks
Ammonite | Level 13

The first thing I'd want to do is transpose that data - it will make it much easier to work with especially when you want to do further work on it. Once you've done that a simple merge will work nicely i.e.

 

data have;
	length id 8 proc1 $5 proc2 $5 proc3 $5 proc4 $5;
	infile datalines;
	input id proc1 proc2 proc3 proc4;
	datalines;
1  288-3 199-2  211-3 133-2
2  274-2  292-2 213-2 493-2
;
run;

data lookup;
	length proc $5;
	infile datalines;
	input Proc;
	datalines;
288-3
292-2
381-2
;
run;

proc transpose data=have out=transposed(rename=(col1=proc _name_=procno));
	by id;
	var proc1-proc4;
run;

proc sort data=transposed;
	by proc;
run;

proc sort data=lookup;
	by proc;
run;

data want;
	merge lookup(in=a) transposed(in=b);
	by proc;
	if a and b;
run;

View solution in original post

7 REPLIES 7
ali_jooan
Obsidian | Level 7

data yo;

infile datalines dsd;

length ID 8.;

length PR1 PR2 PR3 $6.;

input ID PR1 PR2 PR3;

datalines;

1,AA,JJ,SS

2,BB,KK,TT

3,CC,LL,UU

4,DD,MM,VV

5,EE,NN,WW

6,FF,AA,XX

7,GG,PP,YY

8,HH,QQ,ZZ

9,II,RR,AA

run;

 

DATA YO2;

SET YO;

ARRAY PR[9] PR1-PR9;

DO I=1 TO 9;

IF PR[I]='AA' THEN FLG='1';

END;

RUN;

ali_jooan
Obsidian | Level 7

well .. you can change the array size to three, run the do loop to 3 instead of 9 and drop the iterator 'i' from the data step and should be all good.

error_prone
Barite | Level 11
Posting from m mobile phone, so the following code is untested an may contain typos.

data primary_new;
Set primary;
If _n_ = 1 then do;
Declare hash h(dataset: "lookup"):
h.defineKey("proc");
h.defineDone();
End;

array p proc1-proc55;

do i = 1 to dim(p);
flag = not h.check(key: p[i]);
if flag then continue;
end;
Run;
ChrisBrooks
Ammonite | Level 13

The first thing I'd want to do is transpose that data - it will make it much easier to work with especially when you want to do further work on it. Once you've done that a simple merge will work nicely i.e.

 

data have;
	length id 8 proc1 $5 proc2 $5 proc3 $5 proc4 $5;
	infile datalines;
	input id proc1 proc2 proc3 proc4;
	datalines;
1  288-3 199-2  211-3 133-2
2  274-2  292-2 213-2 493-2
;
run;

data lookup;
	length proc $5;
	infile datalines;
	input Proc;
	datalines;
288-3
292-2
381-2
;
run;

proc transpose data=have out=transposed(rename=(col1=proc _name_=procno));
	by id;
	var proc1-proc4;
run;

proc sort data=transposed;
	by proc;
run;

proc sort data=lookup;
	by proc;
run;

data want;
	merge lookup(in=a) transposed(in=b);
	by proc;
	if a and b;
run;
newusernew
Fluorite | Level 6
thanks for the suggestion Chris. Your code works perfect. I changed the merge step to proc sql left join instead. seems to work fine without the need to sort the data.
FredrikE
Rhodochrosite | Level 12

A format will do the work as well 🙂

 

data have;

length id 8 proc1 $5 proc2 $5 proc3 $5 proc4 $5;

input id proc1 proc2 proc3 proc4;

datalines;

1 288-3 199-2 211-3 133-2

2 274-2 292-2 213-2 493-2

;

run;

data lookup;

length proc $5;

infile datalines;

input Proc;

datalines;

288-3

292-2

381-2

;

run;

data lu;

set lookup(rename=(proc=start));

length fmtname type $32 label 8;

fmtname = 'lu';

type = 'C';

label = 1;

if _n_ = 1 then do;

output;

hlo = 'O';

label = 0;

output;

end;

else output;

run;

proc format cntlin=lu;

run;

data want;

set have;

array procs[4] proc1 - proc4;

length flag 8;

flag = 0;

do i = 1 to 4;

flag = ifn(flag=0,put(procs(i),lu.),flag);

end;

run;

 

//Fredrik

newusernew
Fluorite | Level 6

hi FredrikE,

 

your code works perfectly fine (and much more efficient). I tried and it works. It just that I am a new user of SAS, and I don't quite understand how the code starting from ---data lu--- work.

 

Pardon me for the naivety as proc format is a very unfamiliar procedure for me.

 

Whereas the other method suggested by Chris, although it involves a bit more coding, I can see the logic behind it. 

 

having said that, your approach is brilliant, and I would have accepted your code as solution also if the forum rules allow. 

 

also, thank you everyone else for your contribution.

 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 3548 views
  • 5 likes
  • 5 in conversation