BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EEEY
Obsidian | Level 7
I have a small dataset with values like: emplid 1111 2222 3333 Another bigger dataset like: mgrid_1 mgrid_2 mgrid_3 mgrid_4...mgrid_11 101 1111 121212 202 23232 2222 34321 202 1111 23432 3333 11212 4444 55555 6666 Basically, the value of emlid = 1111,2222,3333 locate in different columns of my bigger dataset. The result I am looking for would be emplid mgrid 1111 101 2222 23232 3333 23432. Could anybody help me with a simple solution? Thanks a lot!
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

May be I missunderstand your data and how it's structured. It appears the same value in "have2" can exist more than once. Below code will return at least one row per source row from have1 - but return multiple rows if empid matches more than once to a value in have2.

 

In below code ":" needs to be replaced with ":"

Capture.PNG

 

data have1;
  input empid;
cards;
1111
2222
3333
2124
9999
;

data have2;
  input mgrid_1 mgrid_2 mgrid_3 mgrid_4 mgrid_5;
cards;
101 1111 1212 2124 14322
201 2211 2222 3434 5656
301 3322 2124 5434 3333
401 3355 1199 55 77
;

data want(keep=empid mgrid);
  if _n_=1 then
    do;
      if 0 then set have1(keep=empid) have2(keep=mgrid_1 rename=(mgrid_1=mgrid));
      dcl hash h1(multidata:'y');
      h1.defineKey('empid');
      h1.defineData('mgrid');
      h1.defineDone();

      array mgrids {*} mgrid_1 - mgrid_5;
      do while(not last);
        set have2 end=last;
        do i=2 to dim(mgrids);
          h1.add(key:mgrids[i], data:mgrids[i-1]);
        end;
      end;
    end;
    call missing(of _all_);

    set have1;
    _rc=h1.find();
    output;
    do while(h1.find_next() = 0);
      output;
    end;
 run;

 

View solution in original post

15 REPLIES 15
Steelers_In_DC
Barite | Level 11

I have no idea what you are trying to say.  Put in a format like below so we can see what your actual datasets look like:

 

data have1;
input emplid$;
cards;
1111
2222
3333
;

data have2;
input mgrid_1 mgrid2 mgrid3;
cards;
1111 2222 3333
;

Steelers_In_DC
Barite | Level 11
also a dataset that you 'want'.
EEEY
Obsidian | Level 7
data have1;
input emplid$;
cards;
1111
2222
3333
;

data have2;
input mgrid_1 mgrid_2 mgrid_3 mgrid_4 mgrid_5;
cards;
101 1111 1212 2124 14322
201 2211 2222 3434 5656
301 3322 2124 5434 3333
401 3355 1199 55 77
;

data want;
input emplid mgrid;
cards;
1111 101
2222 2211
3333 5434
;

My dataset would be like this. The last dataset is what I am looking for
PaigeMiller
Diamond | Level 26

I think you'd need to use an ARRAY that loops over all possible columns where the emplid might be. This also allows you to determine the value in the column preceding this.

 

array x{i} col1-col999;
do i=2 to 999;    /* Note 1111 cannot be in col1 */
    if x{i}=1111 then do; emplid=x{i}; mgrid=x{i-1}; output; end;
end;

 

--
Paige Miller
EEEY
Obsidian | Level 7
Thanks PaigeMiller. I guess that would be the solution but I am really bad at array. I have posted the code for the datasets I have and I want. the value 1111 is in my dataset have1.. how should i modify your program?
PaigeMiller
Diamond | Level 26

Well, to tell you the truth, I'd like to see you learn a little about arrays instead of me doing the work for you.

 

I believe you ought to be able to give it a try to get my code to work on your exact data. If it doesn't work, show us the code and the SASLOG and we'll figure out how to make progress.

--
Paige Miller
Steelers_In_DC
Barite | Level 11

I am also not too familiar with arrays.  I need to get on that.  Here is a solution using the same mindset but taking advantage of transpose:

 

data have1;
input emplid$;
cards;
1111
2222
3333
;

data have2;
input mgrid_1$ mgrid_2$ mgrid_3$ mgrid_4$ mgrid_5$;
cards;
101 1111 1212 2124 14322
201 2211 2222 3434 5656
301 3322 2124 5434 3333
401 3355 1199 55 77
;

proc transpose data=have2 out=tran2;by mgrid_1;var mgrid_2 mgrid_3 mgrid_4 mgrid_5;

data lag;
set tran2;
by mgrid_1;
mgrid = lag(col1);
if first.mgrid_1 then mgrid = mgrid_1;
run;

proc sql;
create table want as
select a.emplid,b.mgrid
from have1 a left join
     lag b on
a.emplid = b.col1;

EEEY
Obsidian | Level 7

Thanks for your response. Your code is working well. But my dataset Have2 has 182K rows of record and 11 columns.

I am not sure if proc transpose is a good idea.

EEEY
Obsidian | Level 7

Thanks PaigeMiller for your response. Actually your code is working well. But the thing is 1111 is just one value in the data set Have1. In my really data set Have1, I have 200+ rows. How to modify your code.

The code I have :

data have1;
set emp_level_final2;
keep mgrid_1-mgrid_11;
run;

data have2;
set MS_MGR;
keep emplid;
run;

data want;
set have1;
array x{11} mgrid_1-mgrid_11;

set have2;
array y{1} emplid;
;
do i=2 to 11;    /* Note 1111 cannot be in col1 */
    if x{i}=y{1}  then do; emplid=y{1};sup_id=x{i-1}; output; end;
end;
run;

By using the code above, it only gave me 1 row in the Want data set that I knew it should have come back with about 100 rows). There was no errors in the log. Please help!

MikeZdeb
Rhodochrosite | Level 12

Hi.  You can search a numeric array with the WHICHN function (no loop needed ... there's a WHICHC for character data).  To look for one value ...

 

* note NUMERIC emplid;
data have1;
input emplid @@;
datalines;
1111 2222 3333
;

 

data have2;
input mgrid_1-mgrid_5;
datalines;
101 1111 1212 2124 14322
201 2211 2222 3434 5656
301 3322 2124 5434 3333
401 3355 1199 55 77
;


data want (keep=emplid mgrid);
retain emplid 1111;
array grid(*) mgrid_1-mgrid_5;
set have2;
found = whichn(emplid, of grid(*));
if found then do; mgrid=grid(found-1); output; end;
run;

 

If you want to search for more than one EMPLID ...


data want (keep=emplid mgrid);
* lookup array (HAVE1) ... places all values in an array;
array lookup(3) _temporary_
do j=1 to howmany;
   set have1 nobs=howmany;
   lookup(j) = emplid;
end;

* check grid (HAVE2) for values in lookup (HAVE1);
array grid(5) mgrid_1-mgrid_5;
do until (last);
   set have2 end=last;
do j=1 to howmany;
   found = whichn(lookup(j), of grid(*));
   if found then do; emplid=lookup(j); mgrid=grid(found-1); output; end;
end;
end;
run;

 

data set WANT ...

 

Obs    emplid    mgrid

 1      1111       101
 2      2222      2211
 3      3333      5434

EEEY
Obsidian | Level 7

Thanks Mike for your solution. Unfortunately emplid and mgrid in my 2 datasets are both character instead of numeric. I am wondering if I still can use with whichn function.

 

 

The code is :

data want (keep=emplid mgrid);

* lookup array (HAVE1) ... places all values in an array;

array lookup(*) $ _temporary_;

do j=1 to howmany;

 

 set have1 nobs=howmany;

lookup(j) = emplid;

end;

* check grid (HAVE2) for values in lookup (HAVE1);

array

 

 grid(11) $ mgrid_1-mgrid_11;

do until (last);

 set have2 end=last;

do j=1 to 267;

found = whichn(lookup(j), of grid(*));

 

 if found then do; emplid=lookup(j); mgrid=grid(found-1); output; end;

end;

end;

run;

 

 The log I got is:

24 data want (keep=emplid mgrid);

25 * lookup array (HAVE1) ... places all values in an array;

26 array lookup(*) $ _temporary_;

ERROR: The non-variable based array lookup has been defined with zero elements.

27 do j=1 to howmany;

28 set have1 nobs=howmany;

29 lookup(j) = emplid;

ERROR: Too many array subscripts specified for array lookup.

30 end;

31

32 * check grid (HAVE2) for values in lookup (HAVE1);

33 array grid(11) $ mgrid_1-mgrid_11;

34 do until (last);

35 set have2 end=last;

36 do j=1 to 267;

37 found = whichn(lookup(j), of grid(*));

ERROR: Too many array subscripts specified for array lookup.

38 if found then do; emplid=lookup(j); mgrid=grid(found-1); output; end;

ERROR: Too many array subscripts specified for array lookup.

39 end;

40 end;

41 run;

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).

37:19 37:33

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 2 variables.

WARNING: Data set WORK.WANT was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.01 seconds

 

Patrick
Opal | Level 21

May be I missunderstand your data and how it's structured. It appears the same value in "have2" can exist more than once. Below code will return at least one row per source row from have1 - but return multiple rows if empid matches more than once to a value in have2.

 

In below code ":" needs to be replaced with ":"

Capture.PNG

 

data have1;
  input empid;
cards;
1111
2222
3333
2124
9999
;

data have2;
  input mgrid_1 mgrid_2 mgrid_3 mgrid_4 mgrid_5;
cards;
101 1111 1212 2124 14322
201 2211 2222 3434 5656
301 3322 2124 5434 3333
401 3355 1199 55 77
;

data want(keep=empid mgrid);
  if _n_=1 then
    do;
      if 0 then set have1(keep=empid) have2(keep=mgrid_1 rename=(mgrid_1=mgrid));
      dcl hash h1(multidata:'y');
      h1.defineKey('empid');
      h1.defineData('mgrid');
      h1.defineDone();

      array mgrids {*} mgrid_1 - mgrid_5;
      do while(not last);
        set have2 end=last;
        do i=2 to dim(mgrids);
          h1.add(key:mgrids[i], data:mgrids[i-1]);
        end;
      end;
    end;
    call missing(of _all_);

    set have1;
    _rc=h1.find();
    output;
    do while(h1.find_next() = 0);
      output;
    end;
 run;

 

EEEY
Obsidian | Level 7

Hi Patrick,

 

Thanks for this cool hash object solution. I learned before (a bit only) and felt it was not really useful but right now I realized that this is hash oject should be used for.

 

The only thing was that I had to remove (keep=emplid mgrid) to get the complete dataset and then did the seperate data step to keep these two columns only. Otherwise it only gave me one column mgrid.

 

Really appreciate.

EEEY
Obsidian | Level 7

 hI Patrick,

I am sure you are the expert of hash object.

 

Actually my dataset Have2 is even more complicated than I think.

 

So my

data have1;
  input empid;
cards;1111
2222
3333
2124
9999;

data have2;
  input mgrid_1 mgrid_2 mgrid_3 mgrid_4 mgrid_5;
cards;101 1111 1212 2124 14322
201 2211 2222 3434 5656
301 3322 2124 5434 3333
401 3355 1199 55 77
234 2222 2323 2121 55
;


So I have added the last row into the dataset have2.

When this look-up process execute, note that value 2222 appeares at two spot in the dataset have2.
One in column mgrid_2 and the other one in column mgrid_3.

The result I want to see would be like:

emplid mgrid col
1111 101 1 2222 2211 2
2222 234 1 3333 5434 4 2124 3322 2
 The Column col I would like to add into the want dataset is the level of the manager from Have2 dataset.

Hope you could help me out again.

 Thanks a lot.

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 2396 views
  • 2 likes
  • 5 in conversation