## Whole Dataset, Subset of Dataset, Which Observations Missing?

Solved
Regular Contributor
Posts: 238

# Whole Dataset, Subset of Dataset, Which Observations Missing?

Complete dataset A has 100 observations.

Subset dataset B has 87 observations.

Which observations of A are not in B?

Either create a new dataset "Missing Observations", or write the answer to log, and save log to the current library.

Is a prior sort of both datasets necessary?  Bare in mind that strict observation-to-observation comparison will not line up, as some (13) observations are missing from dataset B.

Thanks!

Nicholas Kormanik

Accepted Solutions
Solution
‎02-07-2013 08:25 PM
Super Contributor
Posts: 1,636

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Hi,

You have to sort both datasets if you use data step:

example:

data a;

input id @@;

cards;

1 2 3 4 5 6 7 8

;

data b;

input id @@;

cards;

2 4 6 8

;

proc sort data=a;by id;run;

proc sort data=b;by id;run;

data missing;

merge a(in=aa) b(in=bb);

by id;

if aa and not bb;

run;

proc print;run;

All Replies
Super User
Posts: 23,754

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Look up SQL Exists Condition

Base SAS(R) 9.2 Procedures Guide

Solution
‎02-07-2013 08:25 PM
Super Contributor
Posts: 1,636

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Hi,

You have to sort both datasets if you use data step:

example:

data a;

input id @@;

cards;

1 2 3 4 5 6 7 8

;

data b;

input id @@;

cards;

2 4 6 8

;

proc sort data=a;by id;run;

proc sort data=b;by id;run;

data missing;

merge a(in=aa) b(in=bb);

by id;

if aa and not bb;

run;

proc print;run;

Regular Contributor
Posts: 238

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Linlin,

Where does the aa and bb come from?

Here is my code so far (assume both sorted):

data missing;

merge complete(in=aa) subset(in=bb);

by combo;

if aa and not bb;

run;

In the above case, are we only comparing the one variable -- combo -- and not worrying about whether the others are the same or not?  If so, that is fine.

Regular Contributor
Posts: 238

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Got it to work following Linlin's suggestion.  Thanks much, Linlin.  The "aa" and "bb" can be anything, doesn't matter, just markers of sorts.  Here's the code that worked:

proc sort data=nicholas.complete;

by combo;

run;

proc sort data=nicholas.subset;

by combo;

run;

data nicholas.missing;

merge nicholas.complete(in=aa) nicholas.subset(in=bb);

by combo;

if aa and not bb;

run;

Frequent Contributor
Posts: 81

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Hi,

I have used SQL merging here and it does not require prior sorting . Please find below the code for your reference.

==============================================

data a;

input id @@;

cards;

1 2 3 4 5 6 7 8

;

data b;

input id @@;

cards;

2 4 6 8

;

proc sql;

create table missing as select * from a where a.id not in (select id from b);

quit;

proc print data=missing;

run;

===================================================

/Daman

Regular Contributor
Posts: 238

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Damanaulakh88 and Reeza,

Thanks for trying to help with SQL, but I'm not following how to do it.  I need the actual code.

Files:

Complete

Subset

First main variable used for comparing:

Combo

Output file with difference:

Missing

Regular Contributor
Posts: 238

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Is this the way?

proc sql;

create table missing as select * from subset where subset.combo not in (select combo from complete);

quit;

Frequent Contributor
Posts: 81

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Hi,

Below is the right code to find out the missing ids:-

proc sql;

create table missing as select * from complete where complete.id not in (select id from subset);

quit;

/Daman

Regular Contributor
Posts: 238

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Still get an error when I run your code.  Might be the "id" part?

Here's the code I ran:

proc sql;

create table nicholas.missing

as select *

from nicholas.complete

where nicholas.complete.id not in

(select id from nicholas.subset);

quit;

Frequent Contributor
Posts: 81

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

I am able to run this code at my end very fine.

proc sql;

create table nicholas.missing

as select *

from nicholas.complete

where nicholas.complete.id not in

(select id from nicholas.subset);

quit;

Here:-

Nicolas is the libref

Complete and Subset are the datasets.

Id is the variable on which you want to reomve your data.

If this is the data at your end then above code is fine.

And also the query you mentioned "Is a prior sort of both datasets necessary?" . So its answer is "NO" . We can use the PROC SQL merging method as described above,

/Daman

Regular Contributor
Posts: 238

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

So I switch "combo" (the actual variable name ) for "id"

Still comes up with an error:

2    proc sql;

3    create table nicholas.missing

4    as select *

5    from nicholas.complete

6    where nicholas.complete.combo not in

-

22

76

ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, *, **, +, -, /, <,

<=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, EXCEPT, GE, GET, GROUP,

GT, GTT, HAVING, IN, INTERSECT, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN,

OR, ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=.

ERROR 76-322: Syntax error, statement will be ignored.

Super Contributor
Posts: 1,636

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

try the code below:

proc sql;

create table nicholas.missing

as select *

from nicholas.complete as a

where a.combo not in

(select combo from nicholas.subset);

quit;

Regular Contributor
Posts: 238

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Once again, Linlin performs her magic.  Big hugs.

As before (your other solution above), though, would you please say what's up with "a".  Is that, too, simply some marker, and can be replaced by anything??

This is the SQL code that works:

proc sql;

create table nicholas.missing

as select *

from nicholas.complete as a

where a.combo not in

(select combo from nicholas.subset);

quit;

Super Contributor
Posts: 1,636

## Re: Whole Dataset, Subset of Dataset, Which Observations Missing?

Hi,

"a" is just an alias of dataset nicholas.comlete. You don't have to use "a". You can replace it with anything.

Best of luck!

🔒 This topic is solved and locked.