BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

Complete dataset A has 100 observations.

Subset dataset B has 87 observations.

Which observations of A are not in B?

Either create a new dataset "Missing Observations", or write the answer to log, and save log to the current library.

Is a prior sort of both datasets necessary?  Bare in mind that strict observation-to-observation comparison will not line up, as some (13) observations are missing from dataset B.

Thanks!

Nicholas Kormanik

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

Hi,

You have to sort both datasets if you use data step:

example:

data a;

input id @@;

cards;

1 2 3 4 5 6 7 8

;

data b;

input id @@;

cards;

2 4 6 8

;

proc sort data=a;by id;run;

proc sort data=b;by id;run;

data missing;

   merge a(in=aa) b(in=bb);

   by id;

   if aa and not bb;

  run;

  proc print;run;

View solution in original post

14 REPLIES 14
Linlin
Lapis Lazuli | Level 10

Hi,

You have to sort both datasets if you use data step:

example:

data a;

input id @@;

cards;

1 2 3 4 5 6 7 8

;

data b;

input id @@;

cards;

2 4 6 8

;

proc sort data=a;by id;run;

proc sort data=b;by id;run;

data missing;

   merge a(in=aa) b(in=bb);

   by id;

   if aa and not bb;

  run;

  proc print;run;

NKormanik
Barite | Level 11

Linlin,

Where does the aa and bb come from?

Here is my code so far (assume both sorted):

data missing;

merge complete(in=aa) subset(in=bb);

by combo;

if aa and not bb;

run;

In the above case, are we only comparing the one variable -- combo -- and not worrying about whether the others are the same or not?  If so, that is fine.

NKormanik
Barite | Level 11

Got it to work following Linlin's suggestion.  Thanks much, Linlin.  The "aa" and "bb" can be anything, doesn't matter, just markers of sorts.  Here's the code that worked:

proc sort data=nicholas.complete;

by combo;

run;

proc sort data=nicholas.subset;

by combo;

run;

data nicholas.missing;

merge nicholas.complete(in=aa) nicholas.subset(in=bb);

by combo;

if aa and not bb;

run;

damanaulakh88
Obsidian | Level 7

Hi,

I have used SQL merging here and it does not require prior sorting . Please find below the code for your reference.

==============================================

data a;

input id @@;

cards;

1 2 3 4 5 6 7 8

;

data b;

input id @@;

cards;

2 4 6 8

;

proc sql;

create table missing as select * from a where a.id not in (select id from b);

quit;

proc print data=missing;

run;

===================================================

/Daman

NKormanik
Barite | Level 11

Damanaulakh88 and Reeza,


Thanks for trying to help with SQL, but I'm not following how to do it.  I need the actual code.


Files:

Complete

Subset


First main variable used for comparing:

Combo


Output file with difference:

Missing

NKormanik
Barite | Level 11

Is this the way?

proc sql;

create table missing as select * from subset where subset.combo not in (select combo from complete);

quit;

damanaulakh88
Obsidian | Level 7

Hi,

Below is the right code to find out the missing ids:-

proc sql;

create table missing as select * from complete where complete.id not in (select id from subset);

quit;


/Daman

NKormanik
Barite | Level 11

Still get an error when I run your code.  Might be the "id" part?

Here's the code I ran:

proc sql;

create table nicholas.missing

as select *

from nicholas.complete

where nicholas.complete.id not in

(select id from nicholas.subset);

quit;

damanaulakh88
Obsidian | Level 7

I am able to run this code at my end very fine.

proc sql;

create table nicholas.missing

as select *

from nicholas.complete

where nicholas.complete.id not in

(select id from nicholas.subset);

quit;

Here:-

Nicolas is the libref

Complete and Subset are the datasets.

Id is the variable on which you want to reomve your data.

If this is the data at your end then above code is fine.

And also the query you mentioned "Is a prior sort of both datasets necessary?" . So its answer is "NO" . We can use the PROC SQL merging method as described above,

/Daman

NKormanik
Barite | Level 11

So I switch "combo" (the actual variable name ) for "id"

Still comes up with an error:

2    proc sql;

3    create table nicholas.missing

4    as select *

5    from nicholas.complete

6    where nicholas.complete.combo not in

                            -

                            22

                            76

ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, *, **, +, -, /, <,

              <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, EXCEPT, GE, GET, GROUP,

              GT, GTT, HAVING, IN, INTERSECT, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN,

              OR, ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=.

ERROR 76-322: Syntax error, statement will be ignored.

Linlin
Lapis Lazuli | Level 10

try the code below:

proc sql;

create table nicholas.missing

as select *

from nicholas.complete as a

where a.combo not in

(select combo from nicholas.subset);

quit;

NKormanik
Barite | Level 11

Once again, Linlin performs her magic.  Big hugs.

As before (your other solution above), though, would you please say what's up with "a".  Is that, too, simply some marker, and can be replaced by anything??

This is the SQL code that works:

proc sql;

create table nicholas.missing

as select *

from nicholas.complete as a

where a.combo not in

(select combo from nicholas.subset);

quit;

Linlin
Lapis Lazuli | Level 10

Hi,

"a" is just an alias of dataset nicholas.comlete. You don't have to use "a". You can replace it with anything.

Best of luck!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1531 views
  • 8 likes
  • 4 in conversation