Re: proc sql update

mariopellegrini · Posted 09-27-2017 08:57 AM

I can not understand why the next proc sql update values missing "var1" at c1 = 2 and c2 = 2 (update key). The resulting value is missing. I expected it to remain unchanged at 20

data A;
input c1 c2 var1 var2;
datalines;
1 1 12 32
1 2 32 21
2 1 12 98
2 2 20 30
2 3 11 11
3 1 21 22
4 1 31 32
2 4 22 22
6 1 77 77

;

data tmp;
input c1 c2 var1 var2;
datalines;
1 1 999 999
1 2 888 999
2 1 999 999
;
 
proc sql;
update  a 
   set var1=(select var1 from tmp as b
		         	where a.c1=b.c1 and a.c2=b.c2)

	where a.c1 in (select c1 from  tmp) and
          a.c2 in (select c2 from  tmp) 
		  ;

quit;

Kurt_Bremser · Posted 09-27-2017 09:16 AM

update in SQL does not behave like update in a data step. Since your subquery produces a missing value, that missing value is set. Use the coalesce() function to keep the original value if the subquery yields no result.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mariopellegrini · Posted 09-27-2017 09:36 AM

I do not understand, which missing value produces the subquery? at c1 = 1, c2 = 1 --- c1 = 1, c2 = 2 --- c1 = 2, c2 = 1 there are 3 lines corresponding to the values 12,32 - 32,21 - 12 98

Kurt_Bremser · Posted 09-27-2017 09:47 AM

There is no observation in dataset tmp with c1=2 AND c2=2, so the subquery can't find a value for that combination, which is present in observation 4 of dataset A, where the value 20 is overwritten with the missing value from the subquery.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mariopellegrini · Posted 09-27-2017 10:01 AM

ok, but all other combinations do not have values in the sub query (example for c1 = 3, c2 = 1). In these cases, however, the values of dataset A remain unchanged. Only for c1 = 2, c2 = 2 is updated to missing the value of var1

Kurt_Bremser · Posted 09-27-2017 10:08 AM

Because your where condition at the end of the query:

where a.c1 in (select c1 from  tmp) and
          a.c2 in (select c2 from  tmp)

prevents the update from even touching those. But you do have at least one c1=2 and one c2=2 in tmp, so the combination of c1=2 and c2=2 is updated.

If you want to prevent this from happening, you best create a compound key (ie c3 = c1*100 + c2) in both datasets and join on that.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mariopellegrini · Posted 09-28-2017 11:14 AM

ok I understand, thank you. but this would slow down the processing if it were large databases. Is not there a different method?

Kurt_Bremser · Posted 09-28-2017 12:04 PM

When I know that I will need a specific derived value later, I set it when such a dataset is created.

Intelligent data makes for intelligent programs.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Classroom Training Available!