Help using Base SAS procedures

DATA step: merge

Reply
Occasional Contributor
Posts: 13

DATA step: merge

Hi,

I have a question regarding the merge statement in the DATA step, as in the example below.

The data1 has variables of policy, location and age_1, and the data2 has variables of policy, location and age_2. The data3 is created by merging these two data sets. The new variable age is set to the value 'X' if both age_1 and age_2 have missing values in the merged data3. Surprisingly, after the DATA step, the values for both age_1 and age_2 became missing. Where is the problem? Please advise.

DATA work.data3;
MERGE work.data1 work.data2;
BY policy location;
IF MISSING(age_1) AND MISSING(age_2) THEN age='X';
RUN;
Super Contributor
Super Contributor
Posts: 3,174

Re: DATA step: merge

Posted in reply to Kevin_Graduate
You need to generate some self-initiated diagnostics in your SAS log to see why your SAS DATA step behaved as you explain. Add this statement and I believe that you will have the answer to your question:

PUTLOG _ALL_;

Scott Barry
SBBWorks, Inc.
Occasional Contributor
Posts: 13

Re: DATA step: merge

i just like to know if I can compare variables in two different datasets during merge. If this is wrong, then I should use another data step after the merge?

anyone can suggest?
Super Contributor
Posts: 578

Re: DATA step: merge

Posted in reply to Kevin_Graduate
I think you need to use the IN option on the datasets. If age is a number, then the age='X' won't work.

DATA work.data3;
MERGE work.data1(in=d1) work.data2(in=d2);
BY policy location;
IF not d1 and not d2 THEN age='X';
RUN;
Occasional Contributor
Posts: 13

Re: DATA step: merge

Thanks for your reply, DBailey. But my question is actually:

Is it right to compare two variables a, b, if they are in two different datasets of data1, and data2? So,

DATA test;
MERGE data1 data2;
BY ID_1;
IF a+b > 10; /*Note: a in data1, b in data2*/

RUN;

My question is: does the merge happens before the IF a+b>10 statement? If this is true, then the code does not cause problem. But if the merge happens after that, then this causes problem, as variable a does not exist at all the data2.
Super Contributor
Posts: 578

Re: DATA step: merge

Posted in reply to Kevin_Graduate
If I remember by program data vector correctly, the PDV is filled in as the input data sets are read. The code you have included should not cause a problem. You might have an issue with missing values which might propogate to your answer.
Frequent Contributor
Posts: 131

Re: DATA step: merge

Thanks for the answer, Dbailey.
Super Contributor
Super Contributor
Posts: 365

Re: DATA step: merge

Posted in reply to Kevin_Graduate
Hello Kevin,

This code:
DATA test;
MERGE data1 data2;
BY ID_1;
IF a+b > 10; /*Note: a in data1, b in data2*/
RUN;

works correctly. Merge happens before IF will execute. However, your logic in the initial code has a problem:
IF MISSING(age_1) AND MISSING(AGE_2) THEN age='X';

You do not specify what to do if age_1 is missing and age_2 is not. This creates missing values for age. The same is true when age_1 is not missing but age_2 is missing.

This is a test program:
[pre]
data data1;
ID_1=1; a=5; output;
ID_1=1; a=.; output;
ID_1=2; a=3; output;
ID_1=2; a=0; output;
run;
data data2;
ID_1=1; b=6; output;
ID_1=1; b=.; output;
ID_1=2; b=8; output;
ID_1=2; b=1; output;
run;
DATA test;
MERGE data1 data2;
IF MISSING(a) AND MISSING(b) THEN age='X';
BY ID_1;
RUN;
[/pre]
Dataset test:
[pre]
ID_1 a b age
1 5 6
1 . . X
2 3 8
2 0 1
[/pre]
Sincerely,
SPR
Super Contributor
Super Contributor
Posts: 3,174

Re: DATA step: merge

As I mentioned, the PUTLOG command would have given you the answer to your latest curios question about SAS MERGE processing.

Scott Barry
SBBWorks, Inc.
Frequent Contributor
Posts: 131

Re: DATA step: merge

Thanks for sharing the great knowledge.

I am reading, practicing and learning. Super software.
Ask a Question
Discussion stats
  • 9 replies
  • 235 views
  • 0 likes
  • 5 in conversation