SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
coolbeans
Fluorite | Level 6

I have two datasets. My original with 150,000 observations and a second one I made from this dataset with exclusions and has a total of 145,000 observations. I'm trying to figure out how to write a code where I can just subtracts these two datasets to  find the total amount excluded / missing. I know the total excluded is 5,000 from simple math but I'm a bit stuck on how we can do this in a  SAS code. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

As long as these two datasets have the same variable name ,type,order , you could get it by using SQL.

data A;
set sashelp.class;
run;

data B;
set sashelp.class;
if _n_ in (1:4) then delete;
run;

proc sql;
create table want as
select * from A
except
select * from B ;
quit;

View solution in original post

8 REPLIES 8
coolbeans
Fluorite | Level 6

Matching observations I guess would just be the unique ID?

Kurt_Bremser
Super User

Please provide example data to illustrate your issue. I can't make code suggestions without knowing your data in structure and content.

Post example data as working DATA steps with DATALINES in a code box.

Ksharp
Super User

As long as these two datasets have the same variable name ,type,order , you could get it by using SQL.

data A;
set sashelp.class;
run;

data B;
set sashelp.class;
if _n_ in (1:4) then delete;
run;

proc sql;
create table want as
select * from A
except
select * from B ;
quit;
coolbeans
Fluorite | Level 6

This worked! Ty so much !

ChrisNZ
Tourmaline | Level 20

I know the total excluded is 5,000 from simple math but I'm a bit stuck on how we can do this in a  SAS code.

You don't know that. Maybe there are duplicates, or unique observations in both data sets.

Never make assumptions about your data quality, and always check for all possible defects.

Assuming you won't rely on the observation order, here is one way:

data A;
  X=1; output;
  X=1; output;  
  X=2; output;
  X=2; output;
run;

data B;
  X=2; output;
  X=3; output;
run;

proc sql;
  create table WANT as 
  select 'In A only' as SRC, * 
  from (select * from A
        except all
        select * from B )
  union all
  select 'In B only' as SRC, * 
  from (select * from B
        except all
        select * from A );
quit;

This results in:

SRC X
In A only 1
In A only 1
In A only 2 *
In B only 3

* The all keyword in except does not suppress duplicate rows, hence why the second X=2 observation is flagged by the code above.

Same thing for keeping both X=1 observations, thanks to union all.

See https://documentation.sas.com/doc/en/sqlproc/3.2/n0vo2lglyrnexwn14emi8m0jqvrj.htm

 

Depending on what you want, you could also use a variation of

proc sql;
  create table WANT as
    select * 
    from A 
  union all
    select * 
    from B 
  except 
   (select * from A 
    intersect 
    select * from B );
quit;

 

 

ChrisNZ
Tourmaline | Level 20

If you want to use the observation order and don't care about duplicates, here is another way to flag non-common  observations:

proc sort data=A out=ASORTED;
  by _ALL_;
run;
proc sort data=B out=BSORTED;
  by _ALL_;
run;
data WANT;
  merge ASORTED(in=A)
        BSORTED(in=B);
  by _ALL_;
  if       A & ^B then src='In A only';
  else if ^A &  B then src='In B only';
  else delete;
run;

 

coolbeans
Fluorite | Level 6

An example of the data would be 

 

DATA exclusions;

set original;

if CVD = . THN DELETE;

RUN;

 

The original dataset would have this kind of data ;

DATA original;

input ID BLD STROKE  CVD ;datalines ;

1 A 1 .

2 B 0 1

3 O 0 0

4 A 1 . ;

RUN;

 

The excluded dataset would result in this data ;

DATA excluded ;

input ID BLD STROKE  CVD ;datalines ;

2 B 0 1

3 O 0 0 ;

RUN;

 

So my goal is to figure out by code the difference in observations which should be 2.

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 814 views
  • 2 likes
  • 4 in conversation