Hi,
I have the following problem: I have two datasets test1 and test2, and want to concatenate them to create a third dataset test3.
test1 has only column A present, while test2 has columns A and B. In test3, I want both columns to be there and, if B has a missing value, then B must be a copy of A.
Example:
data test1;
length A $16;
input A $;
datalines;
ABBEY
ABBOTSFORD
KALAMAZOO
OUAGADOUGOU
;
run;
data test2;
length A $16 B $16;
input A $ B $;
datalines;
OKEFENOKEE OKEFENOKEE
ALBUQUERQUE ALBUQUERQUE
;
run;
data test3;
set test1 test2;
if missing(B) then B=A;
run;
The table test3 should (IMHO) look like this:
| ABBEY | ABBEY |
| ABBOTSFORD | ABBOTSFORD |
| KALAMAZOO | KALAMAZOO |
| OUAGADOUGOU | OUAGADOUGOU |
| OKEFENOKEE | OKEFENOKEE |
| ALBUQUERQUE | ALBUQUERQUE |
Instead, it looks like this:
| ABBEY | ABBEY |
| ABBOTSFORD | ABBEY |
| KALAMAZOO | ABBEY |
| OUAGADOUGOU | ABBEY |
| OKEFENOKEE | OKEFENOKEE |
| ALBUQUERQUE | ALBUQUERQUE |
I am using Base SAS 9.4.01 M7, I have verified that this behaviour exists since M2 at least.
Did I get anything wrong?
What happens is that the B variable, which is read (later) from another input dataset, is not reset to missing when rereading the A dataset.
The easy way out in this case is to check what table you are reading from:
data test3;
set test1(in=in1) test2;
if in1 then B=A;
run;
Alternatively, you can explicitly set the B variable missing before the SET statement:
data test3;
length A B $16;
B='';
set test1 test2;
if missing(B) then B=A;
run;
What happens is that the B variable, which is read (later) from another input dataset, is not reset to missing when rereading the A dataset.
The easy way out in this case is to check what table you are reading from:
data test3;
set test1(in=in1) test2;
if in1 then B=A;
run;
Alternatively, you can explicitly set the B variable missing before the SET statement:
data test3;
length A B $16;
B='';
set test1 test2;
if missing(B) then B=A;
run;
Ok, thanks for the explanation. So in practice it's like I had wrote RETAIN for column B.
Is this the expected behaviour or a bug?
Because, however I accept your workaround, I don't think I want to use it (or remember I must use it) every time I have to do something as simple as a variable assignment!
Using an IN= variable is always the preferred way to identify from which dataset data is read.
And the behavior is not a bug, it is well documented and is like this since the beginning of SAS. Variables coming from any incoming dataset are automatically retained. Otherwise a one-to-many MERGE would not work.
@gabonzo wrote:
Ok, thanks for the explanation. So in practice it's like I had wrote RETAIN for column B.
Is this the expected behaviour or a bug?
Because, however I accept your workaround, I don't think I want to use it (or remember I must use it) every time I have to do something as simple as a variable assignment!
All variables that are coming from input datasets are "retained". Actually they are just not set to missing at the start of a new iteration. Without this 1 to Many MERGE would not work.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.