BookmarkSubscribeRSS Feed
lberghammer
Calcite | Level 5

I have 2 datasets, 1 that is a key and what that is survey data. I am trying to pull the value associated with 2 variables and match with the combination of 2 variables across 6 total pairs.

"key" dataset

row0123
1

0

.11.25

.43

20.32.75

1.29

30.641.5

2.57

401.072.5

4.29

 

Datafile (survey data)

P1P1aP2P2aP3P3aP4P4a
11233412
23312324
310.1334

 

the intersection of the row and column values for the "key" dataset match to the unique combination for P1 and P1a ---P4 and P4a in the "datafile", where the column value in "key" indicates the P"x" lookup value and the row value in "key" indicates the P"Xa" value to create a new variable PX1 for the new value for each record within the dataset. 

result would yield

PX1PX2PX3PX4
(*in key table, over 1 (P1) and down 1 (P1a)) =0.111.54.29.32
1.5.321.52.5
.4300.644.29
3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hallo @lberghammer and welcome to the SAS Support Communities (as a first-time poster)!

 

Let me first create test datasets from your sample data:

data key;
input _row _0-_3;
cards;
1 0 .11  .25 .43
2 0 .32  .75 1.29
3 0 .64  1.5 2.57
4 0 1.07 2.5 4.29
;
 
data survey;
input P1 P1a P2 P2a P3 P3a P4 P4a;
cards;
1 1 2 3 3 4 1 2
2 3 3 1 2 3 2 4
3 1 0 . 1 3 3 4
;

The structure of your real datasets may be different, but for the above datasets the following DATA step works:

data want(drop=_: P1--P4a);
array key[4,0:3] _temporary_;
array _[0:3] _0-_3;
if _n_=1 then do _p=1 to dim1(key);
  set key point=_p;
  do _i=0 to dim(_)-1;
    key[_row,_i]=_[_i];
  end;
end;
set survey;
array Q[*] P1--P4a;
array PX[4];
do _j=1 to dim(PX);
  PX[_j]=key[coalesce(Q[2*_j],1), Q[2*_j-1]];
end;
run;

First ("if _n_=1 ..."), a two-dimensional temporary array (named key) is populated with the values from dataset KEY. Then, using the row and column numbers from dataset SURVEY, variables PX1-PX4 are populated with values retrieved from array key. Note that I obtain PX2=0.43 for the second observation of the survey data, not the 0.32 (typo?) shown in your "result" table. To get all P... variables into dataset WANT, just remove P1--P4a from the DROP= dataset option.

lberghammer
Calcite | Level 5

Yes! 

 

thank you so much this is perfect and it worked!

 

I do have a follow-up, tp account for missing responses/values (for the interim) I converted the 1 missing to 0 i.e. if the "key" dataset were to be

 

 

data key;
input _row . _0-_3;
cards;
1 . 0 .11  .25 .43
2 . 0 .32  .75 1.29
3 . 0 .64  1.5 2.57
4 . 0 1.07 2.5 4.29
;

 

how could you possibly account for that?

 

another way of looking at the same key dataset- say "key2" would be; where Vn and Va are the paired look-up values for P1 and P1a; P2 and P2a --P4 and P4a, to pull in the corresponding "value" PX1 --PX4 (same as before).

VnVaValue
0.0
1..
2..
3..
.1.
.2.
.3.
.4.
010
020
030
040
110.11
120.32
130.64
141.07
210.25
220.75
231.5
242.5
310.43
321.29
332.57
344.29

 

And also, thanks for catching that typo!

FreelanceReinh
Jade | Level 19

@lberghammer wrote:

thank you so much this is perfect and it worked!

 


You're welcome. Glad to hear that it worked.

 

I was already thinking that a vertical structure of the key dataset would be more convenient.

Indeed, with

data key2;
input Vn Va Value;
cards;
0 . 0
1 . .
2 . .
3 . .
. 1 .
. 2 .
. 3 .
. 4 .
0 1 0
0 2 0
0 3 0
0 4 0
1 1 0.11
1 2 0.32
1 3 0.64
1 4 1.07
2 1 0.25
2 2 0.75
2 3 1.5
2 4 2.5
3 1 0.43
3 2 1.29
3 3 2.57
3 4 4.29
;

you can use a hash object containing the data from KEY2 to perform the look-up:

data want(drop=_j V: P1--P4a);
if _n_=1 then do;
  dcl hash h(dataset:'key2');
  h.definekey('Vn','Va');
  h.definedata('Value');
  h.definedone();
  if 0 then set key2;
end;
set survey;
array Q[*] P1--P4a;
array PX[4];
do _j=1 to dim(PX);
  if h.find(key:Q[2*_j-1], key:Q[2*_j])=0 then PX[_j]=Value;
end;
run;

Similar to the first solution, the key table (here: KEY2) is first read into a temporary data structure (here: the hash object h, not an array as in the first solution). Unlike an array, a hash object can deal with missing key values like the missings of Vn and Va in dataset KEY2. The values from dataset SURVEY are passed to the FIND method of the hash object, which retrieves the value of the "data item" Value if matching key values were found. Then the return code of the FIND method is zero and the content of Value is written to the appropriate PX... variable. Otherwise (i.e., if no matching key values were found), the initial missing value of PX... is left unchanged.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 363 views
  • 2 likes
  • 2 in conversation