Solved: Re: Searching across columns for a value in another dataset (possibly ...

beginner · Posted 03-10-2016 12:07 PM

Hi, someone on this help board recently taught me about arrays, and I'm wondering if I can use them to search across multiple columns in "dataset_2" for values in "dataset_1." (Or if not, if there is another way to do this.)

To explain, I have a "dataset_1" that looks something like this:

test other columns

A1111542 .......

A1234567 .......

A1342999 .......

And a "dataset_2" that looks something like this:

patient test_1 test_2 test_3 date_1 date_2 date_3 other columns

Bob A1111542 01/01/1999 .......

Mary A7394237 09/01/1973 .......

Joe A5394238 A2392230 A1234567 05/01/1962 07/01/1962 04/01/1964 .......

Tom A4394203 A1342999 03/01/1986 03/01/1988 .......

I'd like to create a "dataset_3" that adds to the columns of "dataset_2" the following 3 new columns:

patient n_want test_want date_want

Bob 1 A1111542 01/01/1999

Mary

Joe 3 A1234567 04/01/1964

Tom 2 A1342999 03/01/1988

Can I do this with some variation of the following code? Or is there some other way I can do this?

data dataset_3; set dataset_2;
   array c test_1 - test_3;
   array d date_1 - date_3;
n_want = whichn(Can I put something in this spot to search in dataset_2 for all values of "test" in dataset_1?);
   test_want = c[n_want];
   date_want = d[n_want]; format date_want MMDDYY10.;

run;

Thank you in advance for your help.

Astounding · Posted 03-10-2016 12:57 PM

While hash tables would work, a format would work just as well here. The starting point would be to create a format that identifies what tests are in data set 1:

data locate;

set dataset1;

start=test;

label='Found';

fmtname='$test';

run

proc format cntlin=locate;

run;

This creates a format equivalent to:

proc format;

value $test 'A1111542'='Found'

'A1234567'='Found'

...;

Then use that format when processing dataset2 to see which test is found:

data want;

set dataset2;

array c {3} test_1-test_3;

array d {3} date_1-date_3;

do i=1 to 3;

if put(c{i}, $test.)='Found' then do;

n_want=i;

test_want=c{i};

date_want = d{i};

end;

drop i;

run;

You may need to consider what the output should be if more than one test is wanted for the same person.

The code is untested, but should be fine. Good luck.

View solution in original post

LinusH · Posted 03-10-2016 12:25 PM

Not seeing that arrays would be the natural choice here. Rather read dataset1 into a hash table.
You could also do a SQL join, but that would probably require joining on all three test columns, combined with case statements.

Data never sleeps

beginner · Posted 03-10-2016 12:44 PM

Thanks, @LinusH. Sorry to be a dunce, but could you please teach me more about the hash table option? (I am also looking up hash tables online in the mean time. Couldn't find them in the index of my Little SAS Book.)

ballardw · Posted 03-10-2016 12:49 PM

This would be a very simple exercise if the dataset_2 was:

Patient Test Date (and possibly test number if you do multiple tests per day)

with one record per test.

It would look something like:

Proc sql;

create table want as

select a.*, b.patient, b.date(, other variables from dataset_2 )

from dataset_1 as a left join dataset_2 on a.test=b.test;

quit;

Data in the form you have is going to require lots of work continually going back and forth with having to search across multiple variables when the data is as you show.

It may be worth transposing the data to the normalized form of one record per test as and intermediate step.

Astounding · Posted 03-10-2016 12:57 PM

While hash tables would work, a format would work just as well here. The starting point would be to create a format that identifies what tests are in data set 1:

data locate;

set dataset1;

start=test;

label='Found';

fmtname='$test';

run

proc format cntlin=locate;

run;

This creates a format equivalent to:

proc format;

value $test 'A1111542'='Found'

'A1234567'='Found'

...;

Then use that format when processing dataset2 to see which test is found:

data want;

set dataset2;

array c {3} test_1-test_3;

array d {3} date_1-date_3;

do i=1 to 3;

if put(c{i}, $test.)='Found' then do;

n_want=i;

test_want=c{i};

date_want = d{i};

end;

drop i;

run;

You may need to consider what the output should be if more than one test is wanted for the same person.

The code is untested, but should be fine. Good luck.

beginner · Posted 03-10-2016 04:31 PM

Hi, @Astounding. I'd like to try your solution, but am getting an empty dataset (new columns/variables "label" and "fmtname" created, but no entries in dataset). Is this what I should be seeing?

Astounding · Posted 03-10-2016 04:44 PM

It's probably something simple. Check (1) missing the SET statement, (2) missing the semicolon at the end of the DATA statement. If it's not that, post the log from the DATA step ... should be just a few lines there.

beginner · Posted 03-10-2016 05:09 PM

Hi @Astounding. Yeah, I was missing a semicolon -- that was silly of me. Ok, so now I'm ready to try the 2nd half of your solution. I don't think I understand this part. For the if statement, we're referring to the new "locate" dataset, right? How does SAS know that's where we want to look, if we don't have anything about the "locate" dataset in this code? Thanks for your patient instruction.

Astounding · Posted 03-10-2016 05:14 PM

The LOCATE data set is used to create a format named $test. Then the DATA step uses that format.

beginner · Posted 03-10-2016 05:23 PM

Hi @Astounding. Hm, I'm getting errors, including the following note in my log: "Format $test was not found or could not be loaded".

beginner · Posted 03-10-2016 05:57 PM

I think I might have figured out the problem: I had 2 entries with the same value for "test." I'll work with this a little more & see how it goes.

Astounding · Posted 03-10-2016 06:52 PM

That means you still have to run the PROC FORMAT to create $test.

LinusH · Posted 03-10-2016 01:37 PM

Hash table programming is not as simple as SQL so I barely use it. So if you are a SAS beginner, I would first try to solve this with SQL as I outlined, and exemplified in other replies.

Data never sleeps

beginner · Posted 03-10-2016 02:32 PM

P.S. I just modified my original post to clarify: I want to search in dataset_2 "test_n" columns for all (not just one or a handful of) values of "test" from dataset_1.

Searching across columns for values in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for a value in another dataset (possibly using arrays?)

Re: Searching across columns for values in another dataset (possibly using arrays?)

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away