DATA Step, Macro, Functions and more

Searching across columns for values in another dataset (possibly using arrays?)

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

Searching across columns for values in another dataset (possibly using arrays?)

[ Edited ]

Hi, someone on this help board recently taught me about arrays, and I'm wondering if I can use them to search across multiple columns in "dataset_2" for values in "dataset_1." (Or if not, if there is another way to do this.)

 

To explain, I have a "dataset_1" that looks something like this:

test          other columns

A1111542       .......

A1234567       .......

A1342999       .......

 

And a "dataset_2" that looks something like this:

patient    test_1         test_2         test_3         date_1          date_2          date_3          other columns

Bob        A1111542                                         01/01/1999                                              .......

Mary      A7394237                                         09/01/1973                                              .......

Joe        A5394238   A2392230   A1234567   05/01/1962   07/01/1962   04/01/1964    .......

Tom       A4394203   A1342999                      03/01/1986   03/01/1988                         .......

 

I'd like to create a "dataset_3" that adds to the columns of "dataset_2" the following 3 new columns:

patient   n_want    test_want     date_want

Bob        1             A1111542    01/01/1999

Mary

Joe         3             A1234567    04/01/1964

Tom        2             A1342999    03/01/1988

 

Can I do this with some variation of the following code?  Or is there some other way I can do this? 

data dataset_3; set dataset_2;
   array c test_1 - test_3;
   array d date_1 - date_3;
n_want = whichn(Can I put something in this spot to search in dataset_2 for all values of "test" in dataset_1?);
   test_want = c[n_want];
   date_want = d[n_want]; format date_want MMDDYY10.;

run;

 

Thank you in advance for your help.


Accepted Solutions
Solution
‎03-10-2016 06:57 PM
Super User
Posts: 5,071

Re: Searching across columns for a value in another dataset (possibly using arrays?)

While hash tables would work, a format would work just as well here.  The starting point would be to create a format that identifies what tests are in data set 1:

 

data locate;

set dataset1;

start=test;

label='Found';

fmtname='$test';

run

proc format cntlin=locate;

run;

 

This creates a format equivalent to:

 

proc format;

value $test 'A1111542'='Found'

   'A1234567'='Found'

   ...;

 

 

Then use that format when processing dataset2 to see which test is found:

 

data want;

set dataset2;

array c {3} test_1-test_3;

array d {3} date_1-date_3;

do i=1 to 3;

   if put(c{i}, $test.)='Found' then do;

      n_want=i;

      test_want=c{i};

      date_want = d{i};

   end;

end;

drop i;

run;

 

You may need to consider what the output should be if more than one test is wanted for the same person.

 

The code is untested, but should be fine.  Good luck.

View solution in original post


All Replies
Super User
Posts: 5,254

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Not seeing that arrays would be the natural choice here. Rather read dataset1 into a hash table.
You could also do a SQL join, but that would probably require joining on all three test columns, combined with case statements.
Data never sleeps
Contributor
Posts: 25

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Thanks, @LinusH.  Sorry to be a dunce, but could you please teach me more about the hash table option?  (I am also looking up hash tables online in the mean time.  Couldn't find them in the index of my Little SAS Book.)

Super User
Posts: 10,466

Re: Searching across columns for a value in another dataset (possibly using arrays?)

This would be a very simple exercise if the dataset_2 was:

Patient Test Date  (and possibly test number if you do multiple tests per day)

with one record per test.

 

It would look something like:

Proc sql;

   create table want as

   select a.*, b.patient, b.date(, other variables from dataset_2 )

   from dataset_1 as a left join dataset_2 on a.test=b.test;

quit;

 

Data in the form you have is going to require lots of work continually going back and forth with having to search across multiple variables when the data is as you show.

 

It may be worth transposing the data to the normalized form of one record per test as and intermediate step.

Solution
‎03-10-2016 06:57 PM
Super User
Posts: 5,071

Re: Searching across columns for a value in another dataset (possibly using arrays?)

While hash tables would work, a format would work just as well here.  The starting point would be to create a format that identifies what tests are in data set 1:

 

data locate;

set dataset1;

start=test;

label='Found';

fmtname='$test';

run

proc format cntlin=locate;

run;

 

This creates a format equivalent to:

 

proc format;

value $test 'A1111542'='Found'

   'A1234567'='Found'

   ...;

 

 

Then use that format when processing dataset2 to see which test is found:

 

data want;

set dataset2;

array c {3} test_1-test_3;

array d {3} date_1-date_3;

do i=1 to 3;

   if put(c{i}, $test.)='Found' then do;

      n_want=i;

      test_want=c{i};

      date_want = d{i};

   end;

end;

drop i;

run;

 

You may need to consider what the output should be if more than one test is wanted for the same person.

 

The code is untested, but should be fine.  Good luck.

Contributor
Posts: 25

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Hi, @Astounding.  I'd like to try your solution, but am getting an empty dataset (new columns/variables "label" and "fmtname" created, but no entries in dataset).  Is this what I should be seeing?

Super User
Posts: 5,071

Re: Searching across columns for a value in another dataset (possibly using arrays?)

It's probably something simple.  Check (1) missing the SET statement, (2) missing the semicolon at the end of the DATA statement.  If it's not that, post the log from the DATA step ... should be just a few lines there.

Contributor
Posts: 25

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Hi @Astounding.  Yeah, I was missing a semicolon -- that was silly of me.  Ok, so now I'm ready to try the 2nd half of your solution.  I don't think I understand this part.  For the if statement, we're referring to the new "locate" dataset, right?  How does SAS know that's where we want to look, if we don't have anything about the "locate" dataset in this code?  Thanks for your patient instruction.

Super User
Posts: 5,071

Re: Searching across columns for a value in another dataset (possibly using arrays?)

The LOCATE data set is used to create a format named $test.  Then the DATA step uses that format.

Contributor
Posts: 25

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Hi @Astounding.  Hm, I'm getting errors, including the following note in my log: "Format $test was not found or could not be loaded". 

Contributor
Posts: 25

Re: Searching across columns for a value in another dataset (possibly using arrays?)

I think I might have figured out the problem: I had 2 entries with the same value for "test."  I'll work with this a little more & see how it goes.

Super User
Posts: 5,071

Re: Searching across columns for a value in another dataset (possibly using arrays?)

That means you still have to run the PROC FORMAT to create $test.

Super User
Posts: 5,254

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Hash table programming is not as simple as SQL so I barely use it. So if you are a SAS beginner, I would first try to solve this with SQL as I outlined, and exemplified in other replies.
Data never sleeps
Contributor
Posts: 25

Re: Searching across columns for values in another dataset (possibly using arrays?)

P.S.  I just modified my original post to clarify: I want to search in dataset_2 "test_n" columns for all (not just one or a handful of) values of "test" from dataset_1.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 438 views
  • 0 likes
  • 4 in conversation