BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GN0001
Barite | Level 11
data one;
 input ID $ Name $;
datalines;
A01 Sue
A02 Tom
A05 Key
A10 Jim
;
Run;

data two;
 input ID$ Age  sex $;
datalines;
A01 58 F
A02 20 M
A04 47 F
A10 11 M
;
Run;

Data all;
Merge one(in=in1) Two(in=in2);
By ID;
If in1=1 Then Output One;
if in2=1 then output two;
if (in1=0 and in2=1) then output inboth;
if (in1=0 and in2=1) then output NoMatch1;
if (in1=1 and in2=0) then output NoMatch2;
if (in1=1 OR in2=1) then output allrecs;
if (in1+in2=1) then output nomatch;
Run;

I have found this code in Support.sas.com and I tried to run it, either the sets is empty or data all is not shown in the output.

Regards,

blueblue

Blue Blue
1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

@GN0001  - You don't use aliases on MERGE statements you use the IN = option. Only SQL uses aliases. SAS is mistaking them for dataset names.

At the risk of repeating myself, you really, really, really should learn to read and understand the SAS documentation

I appreciate that the documentation is sometimes hard to follow for beginners but it is definitely worth the effort to understand it yourself. 

View solution in original post

21 REPLIES 21
Kurt_Bremser
Super User

The last data step is syntactically incorrect and will not compile. Study the log and take corrective action.

You also have a semantic problem here:

if (in1=0 and in2=1) then output inboth;
if (in1=0 and in2=1) then output NoMatch1;

as the conditions are identical.

Note that the IN= variables are Boolean and can only have values of 0 (false) or 1 (true). The above condition can therefore be written as

if not in1 and in2

In Boolean logic, NOT is interpreted first, then AND, and finally OR.

ballardw
Super User

Do you look at the log?

Your second data step throws many errors:

48   Data all;
49   Merge one(in=in1) Two(in=in2);
50   By ID;
51   If in1=1 Then Output One;
                          ---
                          455
52   if in2=1 then output two;
                          ---
                          455
53   if (in1=0 and in2=1) then output inboth;
                                      ------
                                      455
54   if (in1=0 and in2=1) then output NoMatch1;
                                      --------
                                      455
55   if (in1=1 and in2=0) then output NoMatch2;
                                      --------
                                      455
56   if (in1=1 OR in2=1) then output allrecs;
                                     -------
                                     455
57   if (in1+in2=1) then output nomatch;
                                -------
                                455
ERROR 455-185: Data set was not specified on the DATA statement.

58   Run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.ALL may be incomplete.  When this step was stopped there were 0
         observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

To use the name of a data set as the target of an OUTPUT statement the name must be on the DATA statement.

Since NONE of the data sets used as a target of the OUTPUT appear on the DATA statement the data step does not even execute.

 

If you expect to write to multiple data sets the data statement would look like:

Data allrecs one two inboth nomatch1 nomatch2 nomatch ;

Also when any explicit output statement is used the automatic write at the end of the data step is disabled as SAS has  been told by you that you "know" when and which data set to write to.

 

We won't go into the bit about the potential problems with writing out to the same data sets on a set/merge statement at this time.

Sajid01
Meteorite | Level 14

Hello @GN0001 

1. Dataset all will not have any data because, you are using output and there is no output all in your code.

2.As pointed out by @ballardw  your data statement should be

Data allrecs one two inboth nomatch1 nomatch2 nomatch ;

3.The original datasets one and two will be overwritten by the last data step.

GN0001
Barite | Level 11

Hello,

If I want to name my dataset as all, then what would be the final? I want output to identify where the record is coming from:

data one;
 input ID $ Name $;
datalines;
A01 Sue
A02 Tom
A05 Key
A10 Jim
;
Run;

data two;
 input ID$ Age  sex $;
datalines;
A01 58 F
A02 20 M
A04 47 F
A10 11 M
;
Run;
Proc sort data=one;
by ID;
Run;

Proc sort data=two;
by ID;
Run;


Data All;
Merge one (in=one) Two (in=two);
By ID;
If in1=1 Then Output One;
if in2=1 then output two;
if (in1=1 and in2=1) then output inboth;
if (in1=1 and in2=0) then output NoMatch1;
if (in1=0 and in2=1) then output NoMatch2;
if (in1=1 OR in2=1) then output allrecs;
if (in1+in2=1) then output nomatch;
Run;

Respectfully,

blueblue

Blue Blue
GN0001
Barite | Level 11

Hello team,

I changed the code to this:

data one;
 input ID $ Name $;
datalines;
A01 Sue
A02 Tom
A05 Key
A10 Jim
;
Run;

data two;
 input ID$ Age  sex $;
datalines;
A01 58 F
A02 20 M
A04 47 F
A10 11 M
;
Run;
Proc sort data=one;
by ID;
Run;

Proc sort data=two;
by ID;
Run;


Data one two inboth Nomatch1 nomatch2 allrecs nomatch;

Merge one (in=One) Two (in=two);
By ID;
If one=1 Then Output One;
if two=1 then output two;
if (one=1 and two=1) then output inboth;
if (one=1 and two=0) then output NoMatch1;
if (two=0 and one=1) then output NoMatch2;
if (one=1 OR two=1) then output allrecs;
if (one + two =1) then output nomatch;
Run;

it runs and gives the result. I want to capture all the cases in one dataset. Each record should be identified to each case belong to. This result can't be used for further step.

Regards,

blueblue

Blue Blue
Sajid01
Meteorite | Level 14

In order to capture all observations in one row and also capture the case, then please modify the last step as follows.
The Status variable represents the case.

Data all;
Retain ID Name Age Sex Status;
Length Status $ 15;
Merge one (in=One) Two (in=two);
By ID;
If one=1 Then  Status="One";
if two=1 then Status="Two";
if (one=1 and two=1) then Status= "Inboth";
if (one=1 and two=0) then Status= "NoMatch1";
if (two=0 and one=1) then Status= "NoMatch2";
if (one=1 OR two=1) then Status= "Allrecs";
if (one + two =1) then Status= "Nomatch";
Run;

 

Tom
Super User Tom
Super User

@Sajid01 has found the main problem with the original program.  If you want an indicator of which inputs contributed to this observation you need to store that information into a variable, not use the OUTPUT statement.  The OUTPUT statement is for writing observations to datasets.

 

But the logic of the flags needs to be cleaned up.  Since there are two input datasets there are only three possible combinations of the IN= variables (if both were false then there would be no observation).

 

Here is simplified logic to capture the three possible combinations.

data all;
  retain ID Name Age Sex Status;
  length Status $15;
  merge one (in=One) Two (in=two);
  by ID;
  if not two then Status="One only";
  else if not one then Status="Two only";
  else Status= "Both";
run;

 

ballardw
Super User

@GN0001 wrote:

Hello team,

I changed the code to this:

data one;
 input ID $ Name $;
datalines;
A01 Sue
A02 Tom
A05 Key
A10 Jim
;
Run;

data two;
 input ID$ Age  sex $;
datalines;
A01 58 F
A02 20 M
A04 47 F
A10 11 M
;
Run;
Proc sort data=one;
by ID;
Run;

Proc sort data=two;
by ID;
Run;


Data one two inboth Nomatch1 nomatch2 allrecs nomatch;

Merge one (in=One) Two (in=two);
By ID;
If one=1 Then Output One;
if two=1 then output two;
if (one=1 and two=1) then output inboth;
if (one=1 and two=0) then output NoMatch1;
if (two=0 and one=1) then output NoMatch2;
if (one=1 OR two=1) then output allrecs;
if (one + two =1) then output nomatch;
Run;

it runs and gives the result. I want to capture all the cases in one dataset. Each record should be identified to each case belong to. This result can't be used for further step.

Regards,

blueblue


Since your data sets are small you need to show us what you expect for the output. "This result can't be used for further step" doesn't tell us what you expect to do with a further step or what is wrong with the output.

GN0001
Barite | Level 11

Hello,

I want all the cases captured in one data set.

I thought I can append the different cases.

What do you think?

Regards,

blueblue

Blue Blue
Tom
Super User Tom
Super User

@GN0001 wrote:

Hello,

I want all the cases captured in one data set.

I thought I can append the different cases.

What do you think?

Regards,

blueblue


If you MERGE two datasets all cases will be in the output, unless you do something to exclude them.  The other possibility is that you want to do what is called an N-to-M join or combination where both N and M are larger than one.  That is some of the key values are repeated in both of the source datasets.  In that case the MERGE statement will just match them observations by observation and the result will be max(N,M) observations for those values of the BY variables.  If instead you used SQL to do an full join then you would match each observation from one dataset with every observation from the other dataset.  So you would get N*M observations instead.

 

For your small example show what you want to produce to clarify what you want.

GN0001
Barite | Level 11

Thanks for the response.

data all;
set one two inboth Nomatch1 nomatch2 allrecs nomatch;
Run

 I want to append all the results into one set. If this is doable? I don't need a full join.

 

Regards,

blueblue

Blue Blue
GN0001
Barite | Level 11

Hello team,

A while ago, I posted a question, which I couldn't get what I needed per level of my knowledge.

I will place a code for other users:

 

data out.mydata;
merge thisdata(in=a) thatdata(in=b);
by hkey;
format find $15.;
if a and b then Find = 'Both';
if a and not b then Find = 'not_in_thisdata'
if a then output;
run;

Regards,

blueblue

 

Blue Blue
SASKiwi
PROC Star

So what is your question regarding this program? If you comment out - if a then output - all data from both MERGE datasets will be output.

GN0001
Barite | Level 11

Hello saskiwi;

Are you saying that 

if a and b then Find = 'Both';
if a and not b then Find = 'not_in_thisdata'

would be in the result?

 

Regards,

blueblue 

Blue Blue

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 1859 views
  • 12 likes
  • 6 in conversation