join, match, and pull in new value from 2 tables

lberghammer · Posted 04-23-2022 08:19 AM

I have 2 datasets, 1 that is a key and what that is survey data. I am trying to pull the value associated with 2 variables and match with the combination of 2 variables across 6 total pairs.

"key" dataset

row	1	2	3
1	.11	.25	.43
2	.32	.75	1.29
3	.64	1.5	2.57
4	1.07	2.5	4.29

Datafile (survey data)

P1	P1a	P2	P2a	P3	P3a	P4	P4a
1	1	2	3	3	4	1	2
2	3	3	1	2	3	2	4
3	1	0	.	1	3	3	4

the intersection of the row and column values for the "key" dataset match to the unique combination for P1 and P1a ---P4 and P4a in the "datafile", where the column value in "key" indicates the P"x" lookup value and the row value in "key" indicates the P"Xa" value to create a new variable PX1 for the new value for each record within the dataset.

result would yield

PX1	PX2	PX3	PX4
(*in key table, over 1 (P1) and down 1 (P1a)) =0.11	1.5	4.29	.32
1.5	.32	1.5	2.5
.43	0	0.64	4.29

FreelanceReinh · Posted 04-23-2022 09:56 AM

Hallo @lberghammer and welcome to the SAS Support Communities (as a first-time poster)!

Let me first create test datasets from your sample data:

data key;
input _row _0-_3;
cards;
1 0 .11  .25 .43
2 0 .32  .75 1.29
3 0 .64  1.5 2.57
4 0 1.07 2.5 4.29
;
 
data survey;
input P1 P1a P2 P2a P3 P3a P4 P4a;
cards;
1 1 2 3 3 4 1 2
2 3 3 1 2 3 2 4
3 1 0 . 1 3 3 4
;

The structure of your real datasets may be different, but for the above datasets the following DATA step works:

data want(drop=_: P1--P4a);
array key[4,0:3] _temporary_;
array _[0:3] _0-_3;
if _n_=1 then do _p=1 to dim1(key);
  set key point=_p;
  do _i=0 to dim(_)-1;
    key[_row,_i]=_[_i];
  end;
end;
set survey;
array Q[*] P1--P4a;
array PX[4];
do _j=1 to dim(PX);
  PX[_j]=key[coalesce(Q[2*_j],1), Q[2*_j-1]];
end;
run;

First ("if _n_=1 ..."), a two-dimensional temporary array (named key) is populated with the values from dataset KEY. Then, using the row and column numbers from dataset SURVEY, variables PX1-PX4 are populated with values retrieved from array key. Note that I obtain PX2=0.43 for the second observation of the survey data, not the 0.32 (typo?) shown in your "result" table. To get all P... variables into dataset WANT, just remove P1--P4a from the DROP= dataset option.

lberghammer · Posted 04-23-2022 12:06 PM

Yes!

thank you so much this is perfect and it worked!

I do have a follow-up, tp account for missing responses/values (for the interim) I converted the 1 missing to 0 i.e. if the "key" dataset were to be

data key;
input _row . _0-_3;
cards;
1 . 0 .11  .25 .43
2 . 0 .32  .75 1.29
3 . 0 .64  1.5 2.57
4 . 0 1.07 2.5 4.29
;

how could you possibly account for that?

another way of looking at the same key dataset- say "key2" would be; where Vn and Va are the paired look-up values for P1 and P1a; P2 and P2a --P4 and P4a, to pull in the corresponding "value" PX1 --PX4 (same as before).

Vn	Va	Value
0	.	0
1	.	.
2	.	.
3	.	.
.	1	.
.	2	.
.	3	.
.	4	.
0	1	0
0	2	0
0	3	0
0	4	0
1	1	0.11
1	2	0.32
1	3	0.64
1	4	1.07
2	1	0.25
2	2	0.75
2	3	1.5
2	4	2.5
3	1	0.43
3	2	1.29
3	3	2.57
3	4	4.29

And also, thanks for catching that typo!

FreelanceReinh · Posted 04-23-2022 01:01 PM

@lberghammer wrote:

thank you so much this is perfect and it worked!

You're welcome. Glad to hear that it worked.

I was already thinking that a vertical structure of the key dataset would be more convenient.

Indeed, with

data key2;
input Vn Va Value;
cards;
0 . 0
1 . .
2 . .
3 . .
. 1 .
. 2 .
. 3 .
. 4 .
0 1 0
0 2 0
0 3 0
0 4 0
1 1 0.11
1 2 0.32
1 3 0.64
1 4 1.07
2 1 0.25
2 2 0.75
2 3 1.5
2 4 2.5
3 1 0.43
3 2 1.29
3 3 2.57
3 4 4.29
;

you can use a hash object containing the data from KEY2 to perform the look-up:

data want(drop=_j V: P1--P4a);
if _n_=1 then do;
  dcl hash h(dataset:'key2');
  h.definekey('Vn','Va');
  h.definedata('Value');
  h.definedone();
  if 0 then set key2;
end;
set survey;
array Q[*] P1--P4a;
array PX[4];
do _j=1 to dim(PX);
  if h.find(key:Q[2*_j-1], key:Q[2*_j])=0 then PX[_j]=Value;
end;
run;

Similar to the first solution, the key table (here: KEY2) is first read into a temporary data structure (here: the hash object h, not an array as in the first solution). Unlike an array, a hash object can deal with missing key values like the missings of Vn and Va in dataset KEY2. The values from dataset SURVEY are passed to the FIND method of the hash object, which retrieves the value of the "data item" Value if matching key values were found. Then the return code of the FIND method is zero and the content of Value is written to the appropriate PX... variable. Otherwise (i.e., if no matching key values were found), the initial missing value of PX... is left unchanged.

join, match, and pull in new value from 2 tables

Re: join, match, and pull in new value from 2 tables

Re: join, match, and pull in new value from 2 tables

Re: join, match, and pull in new value from 2 tables

join, match, and pull in new value from 2 tables

Re: join, match, and pull in new value from 2 tables

Re: join, match, and pull in new value from 2 tables

Re: join, match, and pull in new value from 2 tables

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!