- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have done a proc compare across 2 datasets. Can anyone advise if there is a method to output the 58 cases from Observation Summary onto an dataset?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To output the 58 cases from the Observation Summary of a PROC COMPARE
into a dataset, you can use the OUT=
option with the OUTNOEQUAL
and OUTBASE
(or OUTCOMPARE
, depending on which dataset's mismatches you want to capture) options in your PROC COMPARE
statement. Here’s an example:
proc compare base=dataset1 compare=dataset2 out=compared_diff outnoequal outbase;
run;
outnoequal
ensures that only the observations that differ between the datasets are included in the output dataset.outbase
includes the observations from the base dataset in the output dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I suppose you are comparing observations with an ID Variable.
... and your _1_ dataset has 58 observations (IDs) that are not found in your _2_ dataset.
Not sure if PROC COMPARE can output these 58 observations in a dataset.
But you can always use PROC SQL or a data step to accomplish this of course.
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @sasheadache,
In this particular case you can select the last 58 observations from the base dataset:
data want;
set BAU_0000_1_CASH_COLLECTED(firstobs=52452);
run;
In the more general case, where you don't know which of the two datasets has more observations and where you don't want to hardcode the first observation number to read from that dataset, you can use something like this:
data want;
if 0 then do;
set BAU_0000_1_CASH_COLLECTED nobs=n1;
set BAU_0000_2_CASH_COLLECTED nobs=n2;
end;
if n1>n2 then merge BAU_0000_1_CASH_COLLECTED(in=in1)
BAU_0000_2_CASH_COLLECTED(in=in2 drop=_all_);
else merge BAU_0000_1_CASH_COLLECTED(in=in1 drop=_all_)
BAU_0000_2_CASH_COLLECTED(in=in2);
if in1 ne in2;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's a minor tweak on @FreelanceReinh 's suggestion. This code avoids reading through the matching observations before accepting the "extras". Instead, it uses "firstobs=" to jump directly to the extra obs in the longer dataset:
data _null_;
if 0 then set BAU_0000_1_CASH_COLLECTED nobs=n1;
if 0 then set BAU_0000_2_CASH_COLLECTED nobs=n2;
if n1^=n2;
fobs=min(n1,n2)+1;
call execute ('data want;');
call execute (catx(' ','set',ifc(n1>n2,'BAU_0000_1_CASH_COLLECTED','BAU_0000_2_CASH_COLLECTED'),'(firstobs=',fobs,');'));
call execute ('run;');
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please always include code when asking a question about "can I do this". What you have done may well influence the options needed to get a working solution.
Here is one example that creates and output data set of observations in one data set not in the other. It starts with a data set that you should have available so you can actually run the code an see the results. The first set copies a data set, the second copies the same data set except for 4 observations. The OUTBASE requests the observations from the BASE=dataset (the larger one in this case) the OUTNOEQUAL suppresses the observations that have an equal observation in Compare= dataset. Result has the 4 observations in the base set not in the compare.
data class1; set sashelp.class; run; data class2; set sashelp.class (obs=15); run; proc compare base=class1 compare=class2 out=result outbase outnoequal; run;
This example will not work exactly with your data set because you have so many not equal values that either your data sets are not sorted the same (generally a very good idea with Proc Compare) OR you should severely restrict what is being compared. Possibly need an ID variable(s) and perhaps VAR and WITH statements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To output the 58 cases from the Observation Summary of a PROC COMPARE
into a dataset, you can use the OUT=
option with the OUTNOEQUAL
and OUTBASE
(or OUTCOMPARE
, depending on which dataset's mismatches you want to capture) options in your PROC COMPARE
statement. Here’s an example:
proc compare base=dataset1 compare=dataset2 out=compared_diff outnoequal outbase;
run;
outnoequal
ensures that only the observations that differ between the datasets are included in the output dataset.outbase
includes the observations from the base dataset in the output dataset.