Hi,
I have two dataset A and B, A and B have same column names but different data.
I use the following sql code to 'coalesce" data from A and B:
proc sql;
create table c as
select coalesce(a.column1, b.column1) as column 1,
...
...
coalesce(a.column100, b.column100) as column 100
from A as a full join B as b
on a.key = b.key and a.period = b.period;
quit;
My question is: is there any data step equivalent to the proc sql procedure listed above?
I tried
data c;
update a b;
by key, period;
run;
But the problem is that, whenever both table a and table b contains values for say, column1, then the column1 value from the transactional dataset b will be used. My intention is to keep value from the master table a whenever it's not missing and only use value from table b if such value is missing in a(like the way coalesce)
Thanks in advance!
In a data step, you can use the COALESCE function on numeric variables, or the COALESCEC function on character variables. You just have to merge the tables in a DATA step, and rename the variables in one of the tables.
In addition you can use an ARRAY to prevent the need from writing COALESCE 100 times, you code it once and loop through it 100 times.
In a data step, you can use the COALESCE function on numeric variables, or the COALESCEC function on character variables. You just have to merge the tables in a DATA step, and rename the variables in one of the tables.
In addition you can use an ARRAY to prevent the need from writing COALESCE 100 times, you code it once and loop through it 100 times.
thank you so much my friend. I was thinking the same thing and your comment makes me more confident about the solution I was thinking about. By the way, I recognize your ID, you previously helped me (with my another ID). Thank you so much for continuing contributing to the community!
"update b a" works if you want to keep A if populated and only take B if A is not populated.
"update a b" works if you want to keep B if populated and only take A if B is not populated.
Julie,
Actually, that's different than what you posted originally. The originally post looks for the value from A, but if that is missing use the value from B. You would get this (but by a slightly different route) using:
update b a;
You you start with the value from B. But if the value from A exists, use that to replace the value from B. So you would get the value from A (when it exists), but the value from B (when A has just a missing value).
My first suggestion would be to try this assuming the two data sets are sorted correctly:
data c; merge b a ; by key period; run;
The Merge, assuming only ONE of the data sets has multiples of the By variables, will replace variables left to right on a appearance on the Merge statement. So like named variables in A will replace those of B when the By variables match. This will include missing values in A replacing values in B if such occur. If you do not want the missing values to replace them and as @Astounding said neither has duplicates of the By variables you can prevent update of missing values from A by using UPDATE.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.