Hello, all, I am having a problem:
Table A:
ID1 | ID2 | Val1 | Val2 |
1 | a | 132 | 145 |
2 | b | 131 | 156 |
3 | c | 134 | 134 |
Table B:
ID1 | ID2 | Val1 | Val2 |
1 | a | 200 | 203 |
4 | a | 201 | 204 |
3 | c | 202 | 205 |
Output Table
ID1 | ID2 | Val1 | Val2 |
1 | a | 200 | 203 |
2 | b | 131 | 156 |
3 | c | 202 | 205 |
the Table A and B are data source, and if ID1 and ID2 are identical, the val1 and val2 in Table B will replace the Val1 and Val2 in Table1, as the examples showed above, the final results are output table.
I am thinking of loop and just worry about the computing time, since the table would be a very big table that may take a while to complete row by row. Any suggestion that can make calculation a bit faster?
Thanks
If the two tables are sorted, then here is an approach with the update statement
data a;
input ID1 ID2 $ Val1 Val2;
datalines;
1 a 132 145
2 b 131 156
3 c 134 134
;
data b;
input ID1 ID2 $ Val1 Val2;
datalines;
1 a 200 203
4 a 201 204
3 c 202 205
;
proc sort data=a; by ID1 ID2; run;
proc sort data=b; by ID1 ID2; run;
data want;
update a(in=ina) b;
by ID1 ID2;
if ina;
run;
many thanks for your promote answer.
Just having a couple of questions,
1. what 'in=ina' here for?
2. table a is much bigger than table b, b is kind of data reference, to replace to the variable 1 and 2 in table A whose ID1 and ID2 are matching ID1 and 2 in table. is this approach works in this situation?
Thanks again.
Yes, no problem.
@Jonison Did this solve your problem?
If not, you can use the hash object
data want(drop=rc);
if _N_=1 then do;
declare hash h(dataset:'b');
h.definekey('id1', 'id2');
h.definedata('Val1', 'Val2');
h.definedone();
end;
set a;
rc=h.find();
run;
Result:
ID1 ID2 Val1 Val2 1 a 200 203 2 b 131 156 3 c 202 205
@PeterClemmensen , many thanks for your kind reply.
Yes, it works, and seems the hash object is more efficient.
The situation is a bit more complicated.
There are three tables:
A is target table, B and C are reference table.
Table A:
ID1 | ID2 | Val1 | Val2 | Val3 |
1 | b | 12 | 15 | 18 |
3 | dd | 13 | 16 | 19 |
4 | c | 14 | 17 | 11 |
Table b
ID1 | ID2 | Val1 | Val2 | Val3 |
1 | e | 23 | 22 | 27 |
3 | d | 24 | 21 | 26 |
4 | cc | 25 | 28 | 25 |
Table c
ID1 | ID2 | ID3 |
1 | e | b |
3 | d | dd |
4 | c | ccc |
Results:
ID1 | ID2 | Val1 | Val2 | Val3 |
1 | b | 23 | 22 | 27 |
3 | dd | 24 | 21 | 26 |
4 | c | 14 | 17 | 11 |
Because
in table C, in column ID2 and ID3, e and b are identical ID, same for d and dd
so first two columns (ID1 =1 and ID1=3) are replaced by the new value of Val1 val2 and val3.
Sorry about this, but the datatable I am dealing with is a bit chaotic.
How to solve this using hash object? or other alternatives?
Many thanks in advance.
"the datatable I am dealing with is a bit chaotic". This should be your first order of concern.
Secondly, see if this helps you
data a;
input ID1 ID2 $ Val1-Val3;
datalines;
1 b 12 15 18
3 dd 13 16 19
4 c 14 17 11
;
data b;
input ID1 ID2 $ Val1-Val3;
datalines;
1 e 23 22 27
3 d 24 21 26
4 cc 25 28 25
;
data c;
input ID1 ID2 $ ID3 $;
datalines;
1 e b
3 d dd
4 c ccc
;
data want(drop=rc _:);
if _N_=1 then do;
declare hash hb(dataset:'b');
hb.definekey('ID1', 'ID2');
hb.definedata('Val1', 'Val2', 'Val3');
hb.definedone();
declare hash hc(dataset:'c(rename=(ID2=_ID2 ID3=_ID3))');
hc.definekey('ID1');
hc.definedata('_ID2', '_ID3');
hc.definedone();
end;
set a;
length _ID2 $ 20 _ID3 $ 20;
rc=hc.find();
rc=hb.find(key:ID1, key:_ID2);
rc=hb.find(key:ID1, key:_ID3);
run;
Result:
ID1 ID2 Val1 Val2 Val3 1 b 23 22 27 3 dd 24 21 26 4 c 14 17 11
Thanks again, and apology for late respond.
The database is GMP database and I cannot change anything, so sorting the data at application end is the best solution I can think of.
I tried your approach, but the results come back was not right. I looked at the database again, found there are quite missing value in table 3 and I think this may be the reason.
I updated the example:
Table A: | ||||
ID1 | ID2 | Val1 | Val2 | Val3 |
1 | b | 12 | 15 | 18 |
3 | dd | 13 | 16 | 19 |
4 | c | 14 | 17 | 11 |
5 | f | 56 | 7 | 8 |
6 | h | 34 | 5 | 6 |
7 | ggg | 78 | 93 | 3 |
Table b | ||||
ID1 | ID3 | Val1 | Val2 | Val3 |
1 | e | 23 | 22 | 27 |
3 | d | 24 | 21 | 26 |
4 | cc | 25 | 28 | 25 |
5 | h | |||
Table c | ||||
ID1 | ID2 | ID3 | ||
1 | e | b | ||
3 | d | dd | ||
4 | c | |||
5 | h | |||
6 | f | |||
7 | ggg |
ID 3 in table c has lots of missing value (it is a secondary ID of data).
results
ID1 | ID2 | Val1 | Val2 | Val3 |
1 | b | 23 | 22 | 27 |
3 | dd | 24 | 21 | 26 |
4 | c | 14 | 17 | 11 |
5 | f | 56 | 7 | 8 |
6 | h | 34 | 5 | 6 |
7 | ggg | 78 | 93 | 3 |
would you please provide some suggestions on this?
Thanks in advance.
Try this. @Jonison are you able to test this now and get back to me? Would love to solve this for you 🙂
data a;
input ID1 ID2 $ Val1-Val3;
datalines;
1 b 12 15 18
3 dd 13 16 19
4 c 14 17 11
5 f 56 7 8
6 h 34 5 6
7 ggg 78 93 3
;
data b;
input ID1 ID2 $ Val1-Val3;
infile datalines missover;
datalines;
1 e 23 22 27
3 d 24 21 26
4 cc 25 28 25
5 h
;
data c;
input ID1 ID2 $ ID3 $;
infile datalines missover;
datalines;
1 e b
3 d dd
4 c
5 h
6 f
7 ggg
;
data want(drop=rc _:);
if _N_=1 then do;
declare hash hb(dataset:'b');
hb.definekey('ID1', 'ID2');
hb.definedata('Val1', 'Val2', 'Val3');
hb.definedone();
declare hash hc(dataset:'c(rename=(ID2=_ID2 ID3=_ID3))');
hc.definekey('ID1');
hc.definedata('_ID2', '_ID3');
hc.definedone();
end;
set a;
length _ID2 $ 20 _ID3 $ 20;
rc=hc.find();
if cmiss(_ID2, _ID3) = 0 then rc=hb.find(key:ID1, key:_ID2);
if cmiss(_ID2, _ID3) = 0 then rc=hb.find(key:ID1, key:_ID3);
run;
Result
id1 id2 val1 val2 val3 1 b 23 22 27 3 dd 24 21 26 4 c 14 17 11 5 f 56 7 8 6 h 34 5 6 7 ggg 78 93 3
thanks for your promote reply. bet we are in different time zone.
I tried the code, but the data is not sorted out. the results of ID2 doesnot have missing value (suppose to have missing value). It seems that the sorting are based on ID2 and ID3 whose value are not missing.
I changed the data a bit, hope this make it a bit clear.
Table A
ID1 | ID2 | V1 | V2 | V3 |
1 | b | 12 | 15 | 18 |
3 | dd | 13 | 16 | 19 |
4 | c | 14 | 17 | 11 |
5 | f | 56 | 7 | 8 |
6 | h | 34 | 5 | 6 |
7 | ggg | 78 | 93 | 3 |
Table B
ID1 | ID2 | v1 | v2 | v3 |
1 | e | 23 | 22 | 27 |
3 | d | 24 | 21 | 26 |
4 | c | 25 | 28 | 25 |
5 | h | 22 | 26 | 22 |
Table C
ID1 | ID2 | ID3 |
1 | e | b |
3 | d | dd |
4 | c | |
5 | f | |
6 | h |
Results
ID1 | ID2 | V1 | V2 | V3 | _ID2 | _ID3 |
1 | b | 23 | 22 | 27 | e | b |
3 | dd | 24 | 21 | 26 | d | dd |
4 | c | 25 | 28 | 25 | c | |
5 | f | 56 | 7 | 8 | f | |
6 | h | 22 | 26 | 22 | h | |
7 | ggg | 78 | 93 | 3 |
In table A (ID1 =1 and 3, v1-3 values have been replaced by table B because the mapping in Table C)
But also
4 | c | 14 | 17 | 11 |
is replaced as
4 | c | 25 | 28 | 25 |
Same for ID1=6 in table A
Thanks and hope this makes it a bit clear.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.