May be I missunderstand your data and how it's structured. It appears the same value in "have2" can exist more than once. Below code will return at least one row per source row from have1 - but return multiple rows if empid matches more than once to a value in have2.
In below code ":" needs to be replaced with ":"
data have1;
input empid;
cards;
1111
2222
3333
2124
9999
;
data have2;
input mgrid_1 mgrid_2 mgrid_3 mgrid_4 mgrid_5;
cards;
101 1111 1212 2124 14322
201 2211 2222 3434 5656
301 3322 2124 5434 3333
401 3355 1199 55 77
;
data want(keep=empid mgrid);
if _n_=1 then
do;
if 0 then set have1(keep=empid) have2(keep=mgrid_1 rename=(mgrid_1=mgrid));
dcl hash h1(multidata:'y');
h1.defineKey('empid');
h1.defineData('mgrid');
h1.defineDone();
array mgrids {*} mgrid_1 - mgrid_5;
do while(not last);
set have2 end=last;
do i=2 to dim(mgrids);
h1.add(key:mgrids[i], data:mgrids[i-1]);
end;
end;
end;
call missing(of _all_);
set have1;
_rc=h1.find();
output;
do while(h1.find_next() = 0);
output;
end;
run;
I have no idea what you are trying to say. Put in a format like below so we can see what your actual datasets look like:
data have1;
input emplid$;
cards;
1111
2222
3333
;
data have2;
input mgrid_1 mgrid2 mgrid3;
cards;
1111 2222 3333
;
I think you'd need to use an ARRAY that loops over all possible columns where the emplid might be. This also allows you to determine the value in the column preceding this.
array x{i} col1-col999;
do i=2 to 999; /* Note 1111 cannot be in col1 */
if x{i}=1111 then do; emplid=x{i}; mgrid=x{i-1}; output; end;
end;
Well, to tell you the truth, I'd like to see you learn a little about arrays instead of me doing the work for you.
I believe you ought to be able to give it a try to get my code to work on your exact data. If it doesn't work, show us the code and the SASLOG and we'll figure out how to make progress.
I am also not too familiar with arrays. I need to get on that. Here is a solution using the same mindset but taking advantage of transpose:
data have1;
input emplid$;
cards;
1111
2222
3333
;
data have2;
input mgrid_1$ mgrid_2$ mgrid_3$ mgrid_4$ mgrid_5$;
cards;
101 1111 1212 2124 14322
201 2211 2222 3434 5656
301 3322 2124 5434 3333
401 3355 1199 55 77
;
proc transpose data=have2 out=tran2;by mgrid_1;var mgrid_2 mgrid_3 mgrid_4 mgrid_5;
data lag;
set tran2;
by mgrid_1;
mgrid = lag(col1);
if first.mgrid_1 then mgrid = mgrid_1;
run;
proc sql;
create table want as
select a.emplid,b.mgrid
from have1 a left join
lag b on
a.emplid = b.col1;
Thanks for your response. Your code is working well. But my dataset Have2 has 182K rows of record and 11 columns.
I am not sure if proc transpose is a good idea.
Thanks PaigeMiller for your response. Actually your code is working well. But the thing is 1111 is just one value in the data set Have1. In my really data set Have1, I have 200+ rows. How to modify your code.
The code I have :
data have1;
set emp_level_final2;
keep mgrid_1-mgrid_11;
run;
data have2;
set MS_MGR;
keep emplid;
run;
data want;
set have1;
array x{11} mgrid_1-mgrid_11;
set have2;
array y{1} emplid;
;
do i=2 to 11; /* Note 1111 cannot be in col1 */
if x{i}=y{1} then do; emplid=y{1};sup_id=x{i-1}; output; end;
end;
run;
By using the code above, it only gave me 1 row in the Want data set that I knew it should have come back with about 100 rows). There was no errors in the log. Please help!
Hi. You can search a numeric array with the WHICHN function (no loop needed ... there's a WHICHC for character data). To look for one value ...
* note NUMERIC emplid;
data have1;
input emplid @@;
datalines;
1111 2222 3333
;
data have2;
input mgrid_1-mgrid_5;
datalines;
101 1111 1212 2124 14322
201 2211 2222 3434 5656
301 3322 2124 5434 3333
401 3355 1199 55 77
;
data want (keep=emplid mgrid);
retain emplid 1111;
array grid(*) mgrid_1-mgrid_5;
set have2;
found = whichn(emplid, of grid(*));
if found then do; mgrid=grid(found-1); output; end;
run;
If you want to search for more than one EMPLID ...
data want (keep=emplid mgrid);
* lookup array (HAVE1) ... places all values in an array;
array lookup(3) _temporary_
do j=1 to howmany;
set have1 nobs=howmany;
lookup(j) = emplid;
end;
* check grid (HAVE2) for values in lookup (HAVE1);
array grid(5) mgrid_1-mgrid_5;
do until (last);
set have2 end=last;
do j=1 to howmany;
found = whichn(lookup(j), of grid(*));
if found then do; emplid=lookup(j); mgrid=grid(found-1); output; end;
end;
end;
run;
data set WANT ...
Obs emplid mgrid
1 1111 101
2 2222 2211
3 3333 5434
Thanks Mike for your solution. Unfortunately emplid and mgrid in my 2 datasets are both character instead of numeric. I am wondering if I still can use with whichn function.
The code is :
data want (keep=emplid mgrid);
* lookup array (HAVE1) ... places all values in an array;
array lookup(*) $ _temporary_;
do j=1 to howmany;
set have1 nobs=howmany;
lookup(j) = emplid;
end;
* check grid (HAVE2) for values in lookup (HAVE1);
array
grid(11) $ mgrid_1-mgrid_11;
do until (last);
set have2 end=last;
do j=1 to 267;
found = whichn(lookup(j), of grid(*));
if found then do; emplid=lookup(j); mgrid=grid(found-1); output; end;
end;
end;
run;
The log I got is:
24 data want (keep=emplid mgrid);
25 * lookup array (HAVE1) ... places all values in an array;
26 array lookup(*) $ _temporary_;
ERROR: The non-variable based array lookup has been defined with zero elements.
27 do j=1 to howmany;
28 set have1 nobs=howmany;
29 lookup(j) = emplid;
ERROR: Too many array subscripts specified for array lookup.
30 end;
31
32 * check grid (HAVE2) for values in lookup (HAVE1);
33 array grid(11) $ mgrid_1-mgrid_11;
34 do until (last);
35 set have2 end=last;
36 do j=1 to 267;
37 found = whichn(lookup(j), of grid(*));
ERROR: Too many array subscripts specified for array lookup.
38 if found then do; emplid=lookup(j); mgrid=grid(found-1); output; end;
ERROR: Too many array subscripts specified for array lookup.
39 end;
40 end;
41 run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
37:19 37:33
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 2 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
May be I missunderstand your data and how it's structured. It appears the same value in "have2" can exist more than once. Below code will return at least one row per source row from have1 - but return multiple rows if empid matches more than once to a value in have2.
In below code ":" needs to be replaced with ":"
data have1;
input empid;
cards;
1111
2222
3333
2124
9999
;
data have2;
input mgrid_1 mgrid_2 mgrid_3 mgrid_4 mgrid_5;
cards;
101 1111 1212 2124 14322
201 2211 2222 3434 5656
301 3322 2124 5434 3333
401 3355 1199 55 77
;
data want(keep=empid mgrid);
if _n_=1 then
do;
if 0 then set have1(keep=empid) have2(keep=mgrid_1 rename=(mgrid_1=mgrid));
dcl hash h1(multidata:'y');
h1.defineKey('empid');
h1.defineData('mgrid');
h1.defineDone();
array mgrids {*} mgrid_1 - mgrid_5;
do while(not last);
set have2 end=last;
do i=2 to dim(mgrids);
h1.add(key:mgrids[i], data:mgrids[i-1]);
end;
end;
end;
call missing(of _all_);
set have1;
_rc=h1.find();
output;
do while(h1.find_next() = 0);
output;
end;
run;
Hi Patrick,
Thanks for this cool hash object solution. I learned before (a bit only) and felt it was not really useful but right now I realized that this is hash oject should be used for.
The only thing was that I had to remove (keep=emplid mgrid) to get the complete dataset and then did the seperate data step to keep these two columns only. Otherwise it only gave me one column mgrid.
Really appreciate.
hI Patrick,
I am sure you are the expert of hash object.
Actually my dataset Have2 is even more complicated than I think.
So my
data have1;
input empid;
cards;1111
2222
3333
2124
9999;
data have2;
input mgrid_1 mgrid_2 mgrid_3 mgrid_4 mgrid_5;
cards;101 1111 1212 2124 14322
201 2211 2222 3434 5656
301 3322 2124 5434 3333
401 3355 1199 55 77
234 2222 2323 2121 55;
So I have added the last row into the dataset have2.
When this look-up process execute, note that value 2222 appeares at two spot in the dataset have2.
One in column mgrid_2 and the other one in column mgrid_3.
The result I want to see would be like:
emplid mgrid col
1111 101 1
2222 2211 2
2222 234 1
3333 5434 4
2124 3322 2
The Column col I would like to add into the want dataset is the level of the manager from Have2 dataset.
Hope you could help me out again.
Thanks a lot.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.