Hi,
I have the following problem: I have two datasets test1 and test2, and want to concatenate them to create a third dataset test3.
test1 has only column A present, while test2 has columns A and B. In test3, I want both columns to be there and, if B has a missing value, then B must be a copy of A.
Example:
data test1;
length A $16;
input A $;
datalines;
ABBEY
ABBOTSFORD
KALAMAZOO
OUAGADOUGOU
;
run;
data test2;
length A $16 B $16;
input A $ B $;
datalines;
OKEFENOKEE OKEFENOKEE
ALBUQUERQUE ALBUQUERQUE
;
run;
data test3;
set test1 test2;
if missing(B) then B=A;
run;
The table test3 should (IMHO) look like this:
ABBEY | ABBEY |
ABBOTSFORD | ABBOTSFORD |
KALAMAZOO | KALAMAZOO |
OUAGADOUGOU | OUAGADOUGOU |
OKEFENOKEE | OKEFENOKEE |
ALBUQUERQUE | ALBUQUERQUE |
Instead, it looks like this:
ABBEY | ABBEY |
ABBOTSFORD | ABBEY |
KALAMAZOO | ABBEY |
OUAGADOUGOU | ABBEY |
OKEFENOKEE | OKEFENOKEE |
ALBUQUERQUE | ALBUQUERQUE |
I am using Base SAS 9.4.01 M7, I have verified that this behaviour exists since M2 at least.
Did I get anything wrong?
What happens is that the B variable, which is read (later) from another input dataset, is not reset to missing when rereading the A dataset.
The easy way out in this case is to check what table you are reading from:
data test3;
set test1(in=in1) test2;
if in1 then B=A;
run;
Alternatively, you can explicitly set the B variable missing before the SET statement:
data test3;
length A B $16;
B='';
set test1 test2;
if missing(B) then B=A;
run;
What happens is that the B variable, which is read (later) from another input dataset, is not reset to missing when rereading the A dataset.
The easy way out in this case is to check what table you are reading from:
data test3;
set test1(in=in1) test2;
if in1 then B=A;
run;
Alternatively, you can explicitly set the B variable missing before the SET statement:
data test3;
length A B $16;
B='';
set test1 test2;
if missing(B) then B=A;
run;
Ok, thanks for the explanation. So in practice it's like I had wrote RETAIN for column B.
Is this the expected behaviour or a bug?
Because, however I accept your workaround, I don't think I want to use it (or remember I must use it) every time I have to do something as simple as a variable assignment!
Using an IN= variable is always the preferred way to identify from which dataset data is read.
And the behavior is not a bug, it is well documented and is like this since the beginning of SAS. Variables coming from any incoming dataset are automatically retained. Otherwise a one-to-many MERGE would not work.
@gabonzo wrote:
Ok, thanks for the explanation. So in practice it's like I had wrote RETAIN for column B.
Is this the expected behaviour or a bug?
Because, however I accept your workaround, I don't think I want to use it (or remember I must use it) every time I have to do something as simple as a variable assignment!
All variables that are coming from input datasets are "retained". Actually they are just not set to missing at the start of a new iteration. Without this 1 to Many MERGE would not work.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.