I have two datasets that I would like to concatenate. They have the same information, with the same formatting, but in variables with different names. I tried to combine them using the technique given in the example code below. DATA TEST1;
INPUT ID VALUE1;
DATALINES;
1 0.5
2 0.6
3 0.7
4 .
;
RUN;
DATA TEST2;
INPUT ID VALUE2;
DATALINES;
5 0.4
6 0.5
7 0.3
8 .
;
RUN;
DATA TEST3;
SET TEST1 TEST2;
IF MISSING(VALUE1) THEN VALUE1 = VALUE2;
RUN;
PROC PRINT DATA = TEST3 NOOBS; RUN; This is my output: ID VALUE1 VALUE2 1 0.5 . 2 0.6 . 3 0.7 . 4 . . 5 0.4 0.4 6 0.4 0.5 7 0.4 0.3 8 0.4 . I tried a few tweaks to dataset TEST2 and it looks like SAS is simply assigning the first non-missing VALUE2 value to all of the rows after that non-missing value appears. For example, if ID 5 has VALUE2 = 0.5, then ID 5, 6, 7, and 8 will all be assigned VALUE1 = 0.5. If ID 5 has VALUE2 = . and ID 6 has VALUE2 = 0.5, then ID 5 will be assigned VALUE1 = . and ID's 6, 7, and 8 will be assigned VALUE2 = 0.5. (To be clear, my desired result is that VALUE1 is assigned the same value as VALUE2 for each of ID 5, 6, 7, 8, so that ID 5 has VALUE1 = 0.4, ID 6 has VALUE1 = 0.5, etc.) For my original program, I was able to get the outcome I wanted by using the RENAME data set option instead of an if statement. But I want to understand why my code is working this way. It seems totally contrary to my understanding of the DATA step. How is SAS even able to assign the same value to all of the subsequent rows? Without a RETAIN statement, shouldn't everything except _N_ and _ERROR_ be reassigned to missing values before the next record is read? It seems impossible, which means I must be misunderstanding something fundamental about the DATA step.
... View more