I just read the first few pages of the sas paper on the program data vector and reread all of these responses 5 times. My understanding of the dataset creation is as follows: 1. The data step initializes and creates a list of all variables from both test1 and test2. 2. In the PDV, all variables are initialized to missing(according to the sas document) and all variables are retained until another round of the by statement. 3. The first iteration of reading in values for each variable from test1 occurs. It pulls in all values for all matching variables, but because test1 does not have the identifier variable, it keeps its value of missing. 4. My if statement is processed, because it is missing, it combines spot 2, 3, 4. 5. The next iteration of the datastep is executed, because identifier is not on the first dataset, it retains the value from the previous step due to the automatic retain in step 2 above. 6. This loop continues until identifier has a value, in this case from the test2 dataset. The paper i am referencing(page 3): http://support.sas.com/resources/papers/proceedings13/125-2013.pdf Do i understand the process correctly?
... View more