BookmarkSubscribeRSS Feed
Jonison
Fluorite | Level 6

Hello, all, I am having a problem:

Table A:

ID1ID2Val1Val2
1a132145
2b131156
3c134

134

 

Table B:

ID1ID2Val1Val2
1a200203
4a201204
3c202205

 

Output Table

 

ID1ID2Val1Val2
1a200203
2b131156
3c202205

 

the Table A and B are data source, and if ID1 and ID2 are identical, the val1 and val2 in Table B will replace the Val1 and Val2 in Table1, as the examples showed above, the final results are output table.

 

I am thinking of loop and just worry about the computing time, since the table would be a very big table that may take a while to complete row by row. Any suggestion that can make calculation a bit faster?

 

Thanks

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

If the two tables are sorted, then here is an approach with the update statement

 

data a;
input ID1 ID2 $ Val1 Val2;
datalines;
1 a 132 145
2 b 131 156
3 c 134 134
;

data b;
input ID1 ID2 $ Val1 Val2;
datalines;
1 a 200 203
4 a 201 204
3 c 202 205
;

proc sort data=a; by ID1 ID2; run;
proc sort data=b; by ID1 ID2; run;

data want;
    update a(in=ina) b;
    by ID1 ID2;
    if ina;
run;
Jonison
Fluorite | Level 6

many thanks for your promote answer.

Just having a couple of questions,

 

1. what 'in=ina' here for?

2. table a is much bigger than table b, b is kind of data reference, to replace to the variable 1 and 2 in table A whose ID1 and ID2 are matching ID1 and 2 in table. is this approach works in this situation?

 

Thanks again.

 

 

PeterClemmensen
Tourmaline | Level 20

@Jonison Did this solve your problem?

PeterClemmensen
Tourmaline | Level 20

If not, you can use the hash object

 

data want(drop=rc);
    if _N_=1 then do;
        declare hash h(dataset:'b');
        h.definekey('id1', 'id2');
        h.definedata('Val1', 'Val2');
        h.definedone();
    end;

    set a;

    rc=h.find();
run;

Result:

 

ID1   ID2   Val1   Val2
1     a     200    203
2     b     131    156
3     c     202    205
Jonison
Fluorite | Level 6

@PeterClemmensen , many thanks for your kind reply.

 

Yes, it works, and seems the hash object is more efficient.

The situation is a bit more complicated.

 

There are three tables:

A is target table, B and C are reference table.

 

Table A:

 

ID1ID2Val1Val2Val3
1b121518
3dd131619
4c141711

 

 

Table b

ID1ID2Val1Val2Val3
1e232227
3d242126
4cc252825

 

Table c

ID1ID2ID3
1eb
3ddd
4cccc

 

 

Results:

 

ID1ID2Val1Val2Val3
1b232227
3dd242126
4c141711

 

 

Because 

 

in table C, in column ID2 and ID3, e and b are identical ID, same for d and dd

 

so first two columns (ID1 =1 and ID1=3) are replaced by the new value of Val1 val2 and val3.

 

Sorry about this, but the datatable I am dealing with is a bit chaotic.

 

How to solve this using hash object? or other alternatives?

 

Many thanks in advance.

 

PeterClemmensen
Tourmaline | Level 20

"the datatable I am dealing with is a bit chaotic". This should be your first order of concern.

 

Secondly, see if this helps you

 

data a;
input ID1 ID2 $ Val1-Val3;
datalines;
1 b  12 15 18
3 dd 13 16 19
4 c  14 17 11
;
 
data b;
input ID1 ID2 $ Val1-Val3;
datalines;
1 e  23 22 27
3 d  24 21 26
4 cc 25 28 25
;
 
data c;
input ID1 ID2 $ ID3 $;
datalines;
1 e b  
3 d dd 
4 c ccc
;

data want(drop=rc _:);
   if _N_=1 then do;
      declare hash hb(dataset:'b');
      hb.definekey('ID1', 'ID2');
      hb.definedata('Val1', 'Val2', 'Val3');
      hb.definedone();

      declare hash hc(dataset:'c(rename=(ID2=_ID2 ID3=_ID3))');
      hc.definekey('ID1');
      hc.definedata('_ID2', '_ID3');
      hc.definedone();   

   end;

   set a;
   length _ID2 $ 20 _ID3 $ 20;
   
   rc=hc.find();
   rc=hb.find(key:ID1, key:_ID2);
   rc=hb.find(key:ID1, key:_ID3);

run;

 

Result:

 

ID1 ID2  Val1  Val2  Val3 
1   b    23    22    27 
3   dd   24    21    26 
4   c    14    17    11 
Jonison
Fluorite | Level 6

Thanks again, and apology for late respond.

 

The database is GMP database and I cannot change anything, so sorting the data at application end is the best solution I can think of.

 

I tried your approach, but the results come back was not right. I looked at the database again, found there are quite missing value in table 3 and I think this may be the reason.

 

I updated the example:

 

Table A:
    
ID1ID2Val1Val2Val3
1b121518
3dd131619
4c141711
5f5678
6h3456
7ggg78933
Table b    
     
ID1ID3Val1Val2Val3
1e232227
3d242126
4cc252825
5h   
     
Table c    
     
ID1ID2ID3  
1eb  
3ddd  
4c   
5h   
6f   
7ggg   

 

ID 3 in table c has lots of missing value (it is a secondary ID of data).

 

 

results

 

 

ID1
ID2Val1Val2Val3
1b232227
3dd242126
4c141711
5f5678
6h3456
7ggg78933

 

 

would you please provide some suggestions on this?

 

Thanks in advance.

PeterClemmensen
Tourmaline | Level 20

Try this. @Jonison are you able to test this now and get back to me? Would love to solve this for you 🙂

 

data a;
input ID1 ID2 $ Val1-Val3;
datalines;
1 b   12 15 18
3 dd  13 16 19
4 c   14 17 11
5 f   56 7  8
6 h   34 5  6
7 ggg 78 93 3
;
 
data b;
input ID1 ID2 $ Val1-Val3;
infile datalines missover;
datalines;
1 e  23 22 27
3 d  24 21 26
4 cc 25 28 25
5 h          

;
 
data c;
input ID1 ID2 $ ID3 $;
infile datalines missover;
datalines;
1 e b 
3 d dd
4 c   
5 h   
6 f   
7 ggg 
;

data want(drop=rc _:);
   if _N_=1 then do;
      declare hash hb(dataset:'b');
      hb.definekey('ID1', 'ID2');
      hb.definedata('Val1', 'Val2', 'Val3');
      hb.definedone();

      declare hash hc(dataset:'c(rename=(ID2=_ID2 ID3=_ID3))');
      hc.definekey('ID1');
      hc.definedata('_ID2', '_ID3');
      hc.definedone();   

   end;

   set a;
   length _ID2 $ 20 _ID3 $ 20;
   
   rc=hc.find();

   if cmiss(_ID2, _ID3) = 0 then rc=hb.find(key:ID1, key:_ID2);
   if cmiss(_ID2, _ID3) = 0 then rc=hb.find(key:ID1, key:_ID3);

run;

Result

 

id1 id2  val1 val2 val3
1   b    23   22   27
3   dd   24   21   26
4   c    14   17   11
5   f    56   7    8
6   h    34   5    6
7   ggg  78   93   3

 

Jonison
Fluorite | Level 6

thanks for your promote reply. bet we are in different time zone.

 

I tried the code, but the data is not sorted out. the results of ID2 doesnot have missing value (suppose to have missing value). It seems that the sorting are based on ID2 and ID3 whose value are not missing.

 

I changed the data a bit, hope this make it a bit clear. 

 

Table A

ID1
ID2V1V2V3
1b121518
3dd131619
4c141711
5f5678
6 h3456
7ggg78933

Table B

ID1ID2v1v2v3
1e232227
3d242126
4c252825
5h222622

Table C

ID1ID2ID3
1eb
3ddd
4c 
5f 
6h 

 

Results

 

ID1
ID2V1V2V3_ID2_ID3
1b232227eb
3dd242126ddd
4c252825c 
5f5678f 
6 h222622h 
7ggg78933  

 

In table A (ID1 =1 and 3, v1-3 values have been replaced by table B because the mapping in Table C)

 

But also

4c141711

is replaced as

4c252825

 

Same for ID1=6 in table A

 

Thanks and hope this makes it a bit clear.

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1867 views
  • 0 likes
  • 2 in conversation