replace value based on other variables value

Jonison · Posted 11-06-2019 07:03 AM

Hello, all, I am having a problem:

Table A:

ID1	ID2	Val1	Val2
1	a	132	145
2	b	131	156
3	c	134	134

Table B:

ID1	ID2	Val1	Val2
1	a	200	203
4	a	201	204
3	c	202	205

Output Table

ID1	ID2	Val1	Val2
1	a	200	203
2	b	131	156
3	c	202	205

the Table A and B are data source, and if ID1 and ID2 are identical, the val1 and val2 in Table B will replace the Val1 and Val2 in Table1, as the examples showed above, the final results are output table.

I am thinking of loop and just worry about the computing time, since the table would be a very big table that may take a while to complete row by row. Any suggestion that can make calculation a bit faster?

Thanks

PeterClemmensen · Posted 11-06-2019 07:11 AM

If the two tables are sorted, then here is an approach with the update statement

data a;
input ID1 ID2 $ Val1 Val2;
datalines;
1 a 132 145
2 b 131 156
3 c 134 134
;

data b;
input ID1 ID2 $ Val1 Val2;
datalines;
1 a 200 203
4 a 201 204
3 c 202 205
;

proc sort data=a; by ID1 ID2; run;
proc sort data=b; by ID1 ID2; run;

data want;
    update a(in=ina) b;
    by ID1 ID2;
    if ina;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Jonison · Posted 11-06-2019 09:08 AM

many thanks for your promote answer.

Just having a couple of questions,

1. what 'in=ina' here for?

2. table a is much bigger than table b, b is kind of data reference, to replace to the variable 1 and 2 in table A whose ID1 and ID2 are matching ID1 and 2 in table. is this approach works in this situation?

Thanks again.

PeterClemmensen · Posted 11-06-2019 09:20 AM

Yes, no problem.

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 11-07-2019 06:43 AM

@Jonison Did this solve your problem?

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 11-06-2019 07:14 AM

If not, you can use the hash object

data want(drop=rc);
    if _N_=1 then do;
        declare hash h(dataset:'b');
        h.definekey('id1', 'id2');
        h.definedata('Val1', 'Val2');
        h.definedone();
    end;

    set a;

    rc=h.find();
run;

Result:

ID1   ID2   Val1   Val2
1     a     200    203
2     b     131    156
3     c     202    205

The DATA to DATA Step Macro
Blog: SASnrd

Jonison · Posted 11-08-2019 06:30 AM

@PeterClemmensen , many thanks for your kind reply.

Yes, it works, and seems the hash object is more efficient.

The situation is a bit more complicated.

There are three tables:

A is target table, B and C are reference table.

Table A:

ID1	ID2	Val1	Val2	Val3
1	b	12	15	18
3	dd	13	16	19
4	c	14	17	11

Table b

ID1	ID2	Val1	Val2	Val3
1	e	23	22	27
3	d	24	21	26
4	cc	25	28	25

Table c

ID1	ID2	ID3
1	e	b
3	d	dd
4	c	ccc

Results:

ID1	ID2	Val1	Val2	Val3
1	b	23	22	27
3	dd	24	21	26
4	c	14	17	11

Because

in table C, in column ID2 and ID3, e and b are identical ID, same for d and dd

so first two columns (ID1 =1 and ID1=3) are replaced by the new value of Val1 val2 and val3.

Sorry about this, but the datatable I am dealing with is a bit chaotic.

How to solve this using hash object? or other alternatives?

Many thanks in advance.

PeterClemmensen · Posted 11-10-2019 09:10 AM

"the datatable I am dealing with is a bit chaotic". This should be your first order of concern.

Secondly, see if this helps you

data a;
input ID1 ID2 $ Val1-Val3;
datalines;
1 b  12 15 18
3 dd 13 16 19
4 c  14 17 11
;
 
data b;
input ID1 ID2 $ Val1-Val3;
datalines;
1 e  23 22 27
3 d  24 21 26
4 cc 25 28 25
;
 
data c;
input ID1 ID2 $ ID3 $;
datalines;
1 e b  
3 d dd 
4 c ccc
;

data want(drop=rc _:);
   if _N_=1 then do;
      declare hash hb(dataset:'b');
      hb.definekey('ID1', 'ID2');
      hb.definedata('Val1', 'Val2', 'Val3');
      hb.definedone();

      declare hash hc(dataset:'c(rename=(ID2=_ID2 ID3=_ID3))');
      hc.definekey('ID1');
      hc.definedata('_ID2', '_ID3');
      hc.definedone();   

   end;

   set a;
   length _ID2 $ 20 _ID3 $ 20;
   
   rc=hc.find();
   rc=hb.find(key:ID1, key:_ID2);
   rc=hb.find(key:ID1, key:_ID3);

run;

Result:

ID1 ID2  Val1  Val2  Val3 
1   b    23    22    27 
3   dd   24    21    26 
4   c    14    17    11

The DATA to DATA Step Macro
Blog: SASnrd

Jonison · Posted 11-18-2019 12:18 PM

Thanks again, and apology for late respond.

The database is GMP database and I cannot change anything, so sorting the data at application end is the best solution I can think of.

I tried your approach, but the results come back was not right. I looked at the database again, found there are quite missing value in table 3 and I think this may be the reason.

I updated the example:

Table A:
ID1	ID2	Val1	Val2	Val3
1	b	12	15	18
3	dd	13	16	19
4	c	14	17	11
5	f	56	7	8
6	h	34	5	6
7	ggg	78	93	3
Table b

ID1	ID3	Val1	Val2	Val3
1	e	23	22	27
3	d	24	21	26
4	cc	25	28	25
5	h

Table c

ID1	ID2	ID3
1	e	b
3	d	dd
4	c
5	h
6	f
7	ggg

ID 3 in table c has lots of missing value (it is a secondary ID of data).

results

ID1	ID2	Val1	Val2	Val3
1	b	23	22	27
3	dd	24	21	26
4	c	14	17	11
5	f	56	7	8
6	h	34	5	6
7	ggg	78	93	3

would you please provide some suggestions on this?

Thanks in advance.

PeterClemmensen · Posted 11-18-2019 12:49 PM

Try this. @Jonison are you able to test this now and get back to me? Would love to solve this for you 🙂

data a;
input ID1 ID2 $ Val1-Val3;
datalines;
1 b   12 15 18
3 dd  13 16 19
4 c   14 17 11
5 f   56 7  8
6 h   34 5  6
7 ggg 78 93 3
;
 
data b;
input ID1 ID2 $ Val1-Val3;
infile datalines missover;
datalines;
1 e  23 22 27
3 d  24 21 26
4 cc 25 28 25
5 h          

;
 
data c;
input ID1 ID2 $ ID3 $;
infile datalines missover;
datalines;
1 e b 
3 d dd
4 c   
5 h   
6 f   
7 ggg 
;

data want(drop=rc _:);
   if _N_=1 then do;
      declare hash hb(dataset:'b');
      hb.definekey('ID1', 'ID2');
      hb.definedata('Val1', 'Val2', 'Val3');
      hb.definedone();

      declare hash hc(dataset:'c(rename=(ID2=_ID2 ID3=_ID3))');
      hc.definekey('ID1');
      hc.definedata('_ID2', '_ID3');
      hc.definedone();   

   end;

   set a;
   length _ID2 $ 20 _ID3 $ 20;
   
   rc=hc.find();

   if cmiss(_ID2, _ID3) = 0 then rc=hb.find(key:ID1, key:_ID2);
   if cmiss(_ID2, _ID3) = 0 then rc=hb.find(key:ID1, key:_ID3);

run;

Result

id1 id2  val1 val2 val3
1   b    23   22   27
3   dd   24   21   26
4   c    14   17   11
5   f    56   7    8
6   h    34   5    6
7   ggg  78   93   3

The DATA to DATA Step Macro
Blog: SASnrd

Jonison · Posted 11-18-2019 06:37 PM

thanks for your promote reply. bet we are in different time zone.

I tried the code, but the data is not sorted out. the results of ID2 doesnot have missing value (suppose to have missing value). It seems that the sorting are based on ID2 and ID3 whose value are not missing.

I changed the data a bit, hope this make it a bit clear.

Table A

ID1	ID2	V1	V2	V3
1	b	12	15	18
3	dd	13	16	19
4	c	14	17	11
5	f	56	7	8
6	h	34	5	6
7	ggg	78	93	3

Table B

ID1	ID2	v1	v2	v3
1	e	23	22	27
3	d	24	21	26
4	c	25	28	25
5	h	22	26	22

Table C

ID1	ID2	ID3
1	e	b
3	d	dd
4	c
5	f
6	h

Results

ID1	ID2	V1	V2	V3	_ID2	_ID3
1	b	23	22	27	e	b
3	dd	24	21	26	d	dd
4	c	25	28	25	c
5	f	56	7	8	f
6	h	22	26	22	h
7	ggg	78	93	3

In table A (ID1 =1 and 3, v1-3 values have been replaced by table B because the mapping in Table C)

But also

4

c

14

17

11

is replaced as

4

c

25

28

25

Same for ID1=6 in table A

Thanks and hope this makes it a bit clear.

replace value based on other variables value

Re: replace value based on other variables value

Re: replace value based on other variables value

Re: replace value based on other variables value

Re: replace value based on other variables value

Re: replace value based on other variables value

Re: replace value based on other variables value

Re: replace value based on other variables value

Re: replace value based on other variables value

Re: replace value based on other variables value

Re: replace value based on other variables value

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away