data one; input ID $ Name $; datalines; A01 Sue A02 Tom A05 Key A10 Jim ; Run; data two; input ID$ Age sex $; datalines; A01 58 F A02 20 M A04 47 F A10 11 M ; Run; Data all; Merge one(in=in1) Two(in=in2); By ID; If in1=1 Then Output One; if in2=1 then output two; if (in1=0 and in2=1) then output inboth; if (in1=0 and in2=1) then output NoMatch1; if (in1=1 and in2=0) then output NoMatch2; if (in1=1 OR in2=1) then output allrecs; if (in1+in2=1) then output nomatch; Run;
I have found this code in Support.sas.com and I tried to run it, either the sets is empty or data all is not shown in the output.
Regards,
blueblue
@GN0001 - You don't use aliases on MERGE statements you use the IN = option. Only SQL uses aliases. SAS is mistaking them for dataset names.
At the risk of repeating myself, you really, really, really should learn to read and understand the SAS documentation
I appreciate that the documentation is sometimes hard to follow for beginners but it is definitely worth the effort to understand it yourself.
The last data step is syntactically incorrect and will not compile. Study the log and take corrective action.
You also have a semantic problem here:
if (in1=0 and in2=1) then output inboth;
if (in1=0 and in2=1) then output NoMatch1;
as the conditions are identical.
Note that the IN= variables are Boolean and can only have values of 0 (false) or 1 (true). The above condition can therefore be written as
if not in1 and in2
In Boolean logic, NOT is interpreted first, then AND, and finally OR.
Do you look at the log?
Your second data step throws many errors:
48 Data all; 49 Merge one(in=in1) Two(in=in2); 50 By ID; 51 If in1=1 Then Output One; --- 455 52 if in2=1 then output two; --- 455 53 if (in1=0 and in2=1) then output inboth; ------ 455 54 if (in1=0 and in2=1) then output NoMatch1; -------- 455 55 if (in1=1 and in2=0) then output NoMatch2; -------- 455 56 if (in1=1 OR in2=1) then output allrecs; ------- 455 57 if (in1+in2=1) then output nomatch; ------- 455 ERROR 455-185: Data set was not specified on the DATA statement. 58 Run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.ALL may be incomplete. When this step was stopped there were 0 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
To use the name of a data set as the target of an OUTPUT statement the name must be on the DATA statement.
Since NONE of the data sets used as a target of the OUTPUT appear on the DATA statement the data step does not even execute.
If you expect to write to multiple data sets the data statement would look like:
Data allrecs one two inboth nomatch1 nomatch2 nomatch ;
Also when any explicit output statement is used the automatic write at the end of the data step is disabled as SAS has been told by you that you "know" when and which data set to write to.
We won't go into the bit about the potential problems with writing out to the same data sets on a set/merge statement at this time.
Hello @GN0001
1. Dataset all will not have any data because, you are using output and there is no output all in your code.
2.As pointed out by @ballardw your data statement should be
Data allrecs one two inboth nomatch1 nomatch2 nomatch ;
3.The original datasets one and two will be overwritten by the last data step.
Hello,
If I want to name my dataset as all, then what would be the final? I want output to identify where the record is coming from:
data one; input ID $ Name $; datalines; A01 Sue A02 Tom A05 Key A10 Jim ; Run; data two; input ID$ Age sex $; datalines; A01 58 F A02 20 M A04 47 F A10 11 M ; Run; Proc sort data=one; by ID; Run; Proc sort data=two; by ID; Run; Data All; Merge one (in=one) Two (in=two); By ID; If in1=1 Then Output One; if in2=1 then output two; if (in1=1 and in2=1) then output inboth; if (in1=1 and in2=0) then output NoMatch1; if (in1=0 and in2=1) then output NoMatch2; if (in1=1 OR in2=1) then output allrecs; if (in1+in2=1) then output nomatch; Run;
Respectfully,
blueblue
Hello team,
I changed the code to this:
data one; input ID $ Name $; datalines; A01 Sue A02 Tom A05 Key A10 Jim ; Run; data two; input ID$ Age sex $; datalines; A01 58 F A02 20 M A04 47 F A10 11 M ; Run; Proc sort data=one; by ID; Run; Proc sort data=two; by ID; Run; Data one two inboth Nomatch1 nomatch2 allrecs nomatch; Merge one (in=One) Two (in=two); By ID; If one=1 Then Output One; if two=1 then output two; if (one=1 and two=1) then output inboth; if (one=1 and two=0) then output NoMatch1; if (two=0 and one=1) then output NoMatch2; if (one=1 OR two=1) then output allrecs; if (one + two =1) then output nomatch; Run;
it runs and gives the result. I want to capture all the cases in one dataset. Each record should be identified to each case belong to. This result can't be used for further step.
Regards,
blueblue
In order to capture all observations in one row and also capture the case, then please modify the last step as follows.
The Status variable represents the case.
Data all;
Retain ID Name Age Sex Status;
Length Status $ 15;
Merge one (in=One) Two (in=two);
By ID;
If one=1 Then Status="One";
if two=1 then Status="Two";
if (one=1 and two=1) then Status= "Inboth";
if (one=1 and two=0) then Status= "NoMatch1";
if (two=0 and one=1) then Status= "NoMatch2";
if (one=1 OR two=1) then Status= "Allrecs";
if (one + two =1) then Status= "Nomatch";
Run;
@Sajid01 has found the main problem with the original program. If you want an indicator of which inputs contributed to this observation you need to store that information into a variable, not use the OUTPUT statement. The OUTPUT statement is for writing observations to datasets.
But the logic of the flags needs to be cleaned up. Since there are two input datasets there are only three possible combinations of the IN= variables (if both were false then there would be no observation).
Here is simplified logic to capture the three possible combinations.
data all;
retain ID Name Age Sex Status;
length Status $15;
merge one (in=One) Two (in=two);
by ID;
if not two then Status="One only";
else if not one then Status="Two only";
else Status= "Both";
run;
@GN0001 wrote:
Hello team,
I changed the code to this:
data one; input ID $ Name $; datalines; A01 Sue A02 Tom A05 Key A10 Jim ; Run; data two; input ID$ Age sex $; datalines; A01 58 F A02 20 M A04 47 F A10 11 M ; Run; Proc sort data=one; by ID; Run; Proc sort data=two; by ID; Run; Data one two inboth Nomatch1 nomatch2 allrecs nomatch; Merge one (in=One) Two (in=two); By ID; If one=1 Then Output One; if two=1 then output two; if (one=1 and two=1) then output inboth; if (one=1 and two=0) then output NoMatch1; if (two=0 and one=1) then output NoMatch2; if (one=1 OR two=1) then output allrecs; if (one + two =1) then output nomatch; Run;it runs and gives the result. I want to capture all the cases in one dataset. Each record should be identified to each case belong to. This result can't be used for further step.
Regards,
blueblue
Since your data sets are small you need to show us what you expect for the output. "This result can't be used for further step" doesn't tell us what you expect to do with a further step or what is wrong with the output.
Hello,
I want all the cases captured in one data set.
I thought I can append the different cases.
What do you think?
Regards,
blueblue
@GN0001 wrote:
Hello,
I want all the cases captured in one data set.
I thought I can append the different cases.
What do you think?
Regards,
blueblue
If you MERGE two datasets all cases will be in the output, unless you do something to exclude them. The other possibility is that you want to do what is called an N-to-M join or combination where both N and M are larger than one. That is some of the key values are repeated in both of the source datasets. In that case the MERGE statement will just match them observations by observation and the result will be max(N,M) observations for those values of the BY variables. If instead you used SQL to do an full join then you would match each observation from one dataset with every observation from the other dataset. So you would get N*M observations instead.
For your small example show what you want to produce to clarify what you want.
Thanks for the response.
data all;
set one two inboth Nomatch1 nomatch2 allrecs nomatch;
Run
I want to append all the results into one set. If this is doable? I don't need a full join.
Regards,
blueblue
Hello team,
A while ago, I posted a question, which I couldn't get what I needed per level of my knowledge.
I will place a code for other users:
data out.mydata; merge thisdata(in=a) thatdata(in=b); by hkey; format find $15.; if a and b then Find = 'Both'; if a and not b then Find = 'not_in_thisdata' if a then output; run;
Regards,
blueblue
So what is your question regarding this program? If you comment out - if a then output - all data from both MERGE datasets will be output.
Hello saskiwi;
Are you saying that
if a and b then Find = 'Both'; if a and not b then Find = 'not_in_thisdata'
would be in the result?
Regards,
blueblue
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.