BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Julie99999
Obsidian | Level 7

Hi,

I have two dataset A and B, A and B have same column names but different data.

 

I use the following sql code to 'coalesce" data from A and B:

 

proc sql;

create table c as

select coalesce(a.column1, b.column1) as column 1,

...

...

coalesce(a.column100, b.column100) as column 100

from A as a full join B as b

on a.key = b.key and a.period = b.period;

quit;

 

My question is: is there any data step equivalent to the proc sql procedure listed above?

 

I tried

data c;

update a b;

by key, period;

run;

 

But the problem is that, whenever both table a and table b contains values for say, column1, then the column1 value from the transactional dataset b will be used. My intention is to keep value from the master table a whenever it's not missing and only use value from table b if such value is missing in a(like the way coalesce)

 

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

In a data step, you can use the COALESCE function on numeric variables, or the COALESCEC function on character variables. You just have to merge the tables in a DATA step, and rename the variables in one of the tables.

 

In addition you can use an ARRAY to prevent the need from writing COALESCE 100 times, you code it once and loop through it 100 times.

--
Paige Miller

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

In a data step, you can use the COALESCE function on numeric variables, or the COALESCEC function on character variables. You just have to merge the tables in a DATA step, and rename the variables in one of the tables.

 

In addition you can use an ARRAY to prevent the need from writing COALESCE 100 times, you code it once and loop through it 100 times.

--
Paige Miller
Julie99999
Obsidian | Level 7

thank you so much my friend. I was thinking the same thing and your comment makes me more confident about the solution I was thinking about. By the way, I recognize your ID, you previously helped me (with my another ID). Thank you so much for continuing contributing to the community!

Astounding
PROC Star
The UPDATE approach will work, so long as the data sets are limited to one observation per PERIOD KEY combination. Just change the order in the UPDATE statement:

update b a;
Julie99999
Obsidian | Level 7
thank you for your reply. THis is my quesiton actually. Because for UPDATE, if we do update b a; then whenever there is value in a, the previous value in b will be overrided. that's not intended. What if I want to always keep values from b, but only get values from a if such values in b is missing? It seems for "update b a", the result is slightly different.
SASKiwi
PROC Star

@Julie99999 :

"update b a" works if you want to keep A if populated and only take B if A is not populated.

"update a b" works if you want to keep B if populated and only take A if B is not populated.

Julie99999
Obsidian | Level 7
thanks for the succinct take-away!
Astounding
PROC Star

Julie,

 

Actually, that's different than what you posted originally.  The originally post looks for the value from A, but if that is missing use the value from B.  You would get this (but by a slightly different route) using:

 

update b a;

 

You you start with the value from B.  But if the value from A exists, use that to replace the value from B.  So you would get the value from A (when it exists), but the value from B (when A has just a missing value).

Julie99999
Obsidian | Level 7
Astounding, I see your point. That's smart! That would work and solves exactly my question, how come I didn't realize by simply switching to "b a" would solve the problem.
update b a;
thanks for the "A ha" moment you've brought to me. 🙂
ballardw
Super User

My first suggestion would be to try this assuming the two data sets are sorted correctly:

 

data c;
   merge b a ;
   by key period;
run;

 

The Merge, assuming only ONE of the data sets has multiples of the By variables, will replace variables left to right on a appearance on the Merge statement. So like named variables in A will replace those of B when the By variables match. This will include missing values in A replacing values in B if such occur. If you do not want the missing values to replace them and as @Astounding said neither has duplicates of the By variables you can prevent update of missing values from A by using UPDATE.

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1953 views
  • 7 likes
  • 5 in conversation