- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have the following problem: I have two datasets test1 and test2, and want to concatenate them to create a third dataset test3.
test1 has only column A present, while test2 has columns A and B. In test3, I want both columns to be there and, if B has a missing value, then B must be a copy of A.
Example:
data test1;
length A $16;
input A $;
datalines;
ABBEY
ABBOTSFORD
KALAMAZOO
OUAGADOUGOU
;
run;
data test2;
length A $16 B $16;
input A $ B $;
datalines;
OKEFENOKEE OKEFENOKEE
ALBUQUERQUE ALBUQUERQUE
;
run;
data test3;
set test1 test2;
if missing(B) then B=A;
run;
The table test3 should (IMHO) look like this:
ABBEY | ABBEY |
ABBOTSFORD | ABBOTSFORD |
KALAMAZOO | KALAMAZOO |
OUAGADOUGOU | OUAGADOUGOU |
OKEFENOKEE | OKEFENOKEE |
ALBUQUERQUE | ALBUQUERQUE |
Instead, it looks like this:
ABBEY | ABBEY |
ABBOTSFORD | ABBEY |
KALAMAZOO | ABBEY |
OUAGADOUGOU | ABBEY |
OKEFENOKEE | OKEFENOKEE |
ALBUQUERQUE | ALBUQUERQUE |
I am using Base SAS 9.4.01 M7, I have verified that this behaviour exists since M2 at least.
Did I get anything wrong?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What happens is that the B variable, which is read (later) from another input dataset, is not reset to missing when rereading the A dataset.
The easy way out in this case is to check what table you are reading from:
data test3;
set test1(in=in1) test2;
if in1 then B=A;
run;
Alternatively, you can explicitly set the B variable missing before the SET statement:
data test3;
length A B $16;
B='';
set test1 test2;
if missing(B) then B=A;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What happens is that the B variable, which is read (later) from another input dataset, is not reset to missing when rereading the A dataset.
The easy way out in this case is to check what table you are reading from:
data test3;
set test1(in=in1) test2;
if in1 then B=A;
run;
Alternatively, you can explicitly set the B variable missing before the SET statement:
data test3;
length A B $16;
B='';
set test1 test2;
if missing(B) then B=A;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok, thanks for the explanation. So in practice it's like I had wrote RETAIN for column B.
Is this the expected behaviour or a bug?
Because, however I accept your workaround, I don't think I want to use it (or remember I must use it) every time I have to do something as simple as a variable assignment!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Using an IN= variable is always the preferred way to identify from which dataset data is read.
And the behavior is not a bug, it is well documented and is like this since the beginning of SAS. Variables coming from any incoming dataset are automatically retained. Otherwise a one-to-many MERGE would not work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@gabonzo wrote:
Ok, thanks for the explanation. So in practice it's like I had wrote RETAIN for column B.
Is this the expected behaviour or a bug?
Because, however I accept your workaround, I don't think I want to use it (or remember I must use it) every time I have to do something as simple as a variable assignment!
All variables that are coming from input datasets are "retained". Actually they are just not set to missing at the start of a new iteration. Without this 1 to Many MERGE would not work.