SAS Programming

coolbeans · Posted 03-03-2025 01:25 AM

I have two datasets. My original with 150,000 observations and a second one I made from this dataset with exclusions and has a total of 145,000 observations. I'm trying to figure out how to write a code where I can just subtracts these two datasets to find the total amount excluded / missing. I know the total excluded is 5,000 from simple math but I'm a bit stuck on how we can do this in a SAS code.

Ksharp · Posted 03-03-2025 01:57 AM

As long as these two datasets have the same variable name ,type,order , you could get it by using SQL.

data A;
set sashelp.class;
run;

data B;
set sashelp.class;
if _n_ in (1:4) then delete;
run;

proc sql;
create table want as
select * from A
except
select * from B ;
quit;

View solution in original post

Kurt_Bremser · Posted 03-03-2025 01:46 AM

Is there a key to identify matching observations?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

coolbeans · Posted 03-03-2025 01:50 AM

Matching observations I guess would just be the unique ID?

Kurt_Bremser · Posted 03-03-2025 04:14 AM

Please provide example data to illustrate your issue. I can't make code suggestions without knowing your data in structure and content.

Post example data as working DATA steps with DATALINES in a code box.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 03-03-2025 01:57 AM

As long as these two datasets have the same variable name ,type,order , you could get it by using SQL.

data A;
set sashelp.class;
run;

data B;
set sashelp.class;
if _n_ in (1:4) then delete;
run;

proc sql;
create table want as
select * from A
except
select * from B ;
quit;

coolbeans · Posted 03-03-2025 11:17 AM

This worked! Ty so much !

ChrisNZ · Posted 03-03-2025 05:46 AM

I know the total excluded is 5,000 from simple math but I'm a bit stuck on how we can do this in a SAS code.

You don't know that. Maybe there are duplicates, or unique observations in both data sets.

Never make assumptions about your data quality, and always check for all possible defects.

Assuming you won't rely on the observation order, here is one way:

data A;
  X=1; output;
  X=1; output;  
  X=2; output;
  X=2; output;
run;

data B;
  X=2; output;
  X=3; output;
run;

proc sql;
  create table WANT as 
  select 'In A only' as SRC, * 
  from (select * from A
        except all
        select * from B )
  union all
  select 'In B only' as SRC, * 
  from (select * from B
        except all
        select * from A );
quit;

This results in:

SRC	X
In A only	1
In A only	1
In A only	2 *
In B only	3

* The all keyword in except does not suppress duplicate rows, hence why the second X=2 observation is flagged by the code above.

Same thing for keeping both X=1 observations, thanks to union all.

See https://documentation.sas.com/doc/en/sqlproc/3.2/n0vo2lglyrnexwn14emi8m0jqvrj.htm

Depending on what you want, you could also use a variation of

proc sql;
  create table WANT as
    select * 
    from A 
  union all
    select * 
    from B 
  except 
   (select * from A 
    intersect 
    select * from B );
quit;

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 03-03-2025 06:02 AM

If you want to use the observation order and don't care about duplicates, here is another way to flag non-common observations:

proc sort data=A out=ASORTED;
  by _ALL_;
run;
proc sort data=B out=BSORTED;
  by _ALL_;
run;
data WANT;
  merge ASORTED(in=A)
        BSORTED(in=B);
  by _ALL_;
  if       A & ^B then src='In A only';
  else if ^A &  B then src='In B only';
  else delete;
run;

High-Performance SAS Coding - Third Edition

coolbeans · Posted 03-03-2025 11:07 AM

An example of the data would be

DATA exclusions;

set original;

if CVD = . THN DELETE;

RUN;

The original dataset would have this kind of data ;

DATA original;

input ID BLD STROKE CVD ;datalines ;

1 A 1 .

2 B 0 1

3 O 0 0

4 A 1 . ;

RUN;

The excluded dataset would result in this data ;

DATA excluded ;

input ID BLD STROKE CVD ;datalines ;

2 B 0 1

3 O 0 0 ;

RUN;

So my goal is to figure out by code the difference in observations which should be 2.

SAS Programming

Finding observation difference in two separate datasets

Re: Finding observation difference in two separate datasets

Re: Finding observation difference in two separate datasets

Re: Finding observation difference in two separate datasets

Re: Finding observation difference in two separate datasets

Re: Finding observation difference in two separate datasets

Re: Finding observation difference in two separate datasets

Re: Finding observation difference in two separate datasets

Re: Finding observation difference in two separate datasets

Re: Finding observation difference in two separate datasets

Follow Us

What is...

SAS Programming

Join us for our biggest event of the year!

SAS Training: Just a Click Away

Follow Us

What is...