DATA Step, Macro, Functions and more

Matching observations from one data to see if any are missing in another dataset

Reply
Occasional Contributor
Posts: 11

Matching observations from one data to see if any are missing in another dataset

Hello everyone,

 

I have two datasets and I need to see if observations from one dataset are missing from another dataset. For example,

 

data have;
   input Name :$upcase4. id Net Reg Loss;
   cards;home 344 200  50    .
home 543  80   .    5
TDR  453  75   .    3
TDR  543   .   5    .
;
;;; run;
data have;
input Name :$upcase4. id Net Reg Loss;
cards;
home
344 200 50 . home 543 80 . 5
;
;;;
run;

 I want to see if any ids are missing in the second dataset compared with the first dataset. For example, id 453 is missing but 543 and 344 are not.

 

Thanks in advance Smiley Happy

 

Kind regards,

Jack

Moderator
Posts: 238

Re: Matching observations from one data to see if any are missing in another dataset

Assuming the tables are called First & Second:

 

 

proc sql;
  /* Get IDs in First but not in Second */
  select id from First except select id from Second;
  /* Get IDs in Second but not in First */
  select id from Second except select id from First;
  /* Get IDs in both */
  select id from Second intersect select id from First;
quit;

Does this help?

Super User
Posts: 10,483

Re: Matching observations from one data to see if any are missing in another dataset


AndrewHowell wrote:

Assuming the tables are called First & Second:

 

 

proc sql;
  /* Get IDs in First but not in Second */
  select id from First except select id from Second;
  /* Get IDs in Second but not in First */
  select id from Second except select id from First;
  /* Get IDs in both */
  select id from Second intersect select id from First;
quit;

Does this help?


May want select distinct id if there are duplicates.

Super User
Posts: 6,932

Re: Matching observations from one data to see if any are missing in another dataset

proc sort data=have1;
by id;
run;

proc sort data=have2;
by id;
run;

data
  miss_from_have2
  miss_from_have1
  both
;
merge
  have1 (in=a keep=id)
  have2 (in=b keep=id)
;
by id;
if a and b
then output both;
else if a
then output miss_from_have2;
else output miss_from_have1;
run;

might perform better on large datasets.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Valued Guide
Posts: 505

Re: Matching observations from one data to see if any are missing in another dataset

Elegant SQL intersection and exception of sets of data

Matching observations from one data to see if any are missing in another dataset

see
https://goo.gl/xCjIsU
https://communities.sas.com/t5/Base-SAS-Programming/Matching-observations-from-one-data-to-see-if-any-are-missing-in/m-p/343684

HAVE
=====

Up to 40 obs WORK.HAV1ST total obs=2

Obs    NAME    ID

 1     home    ___
 2     home    XXX


Up to 40 obs WORK.HAV2ND total obs=4

Obs    NAME    ID

 1     home    ___
 2     home    XXX
 3     TDR     @@@
 4     TDR     XXX

WANT
=====

   SQLOBS=0 hav1st except hav2nd

   SQLOBS=1 hav2nd except hav1st

   ID
   --------
   @@@

   SQLOBS=2 hav2nd intersect hav1st

   ID
   --------
   XXX
   ___

FULL SOLUTION
===============


%utlnopts; /* turn options off so so only the puts statements are in the log */
proc sql;

  /* ID @@@ Get IDs in hav1st but not in hav2nd */
  select id from hav1st except select id from hav2nd;
  %put &=sqlobs hav1st except hav2nd;

  /* Get IDs in hav2nd but not in hav1st */
  select id from hav2nd except select id from hav1st;
  %put &=sqlobs hav2nd except hav1st;

  /* Get IDs in both */
  select id from hav2nd intersect select id from hav1st;
  %put &=sqlobs hav2nd intersect hav1st;

quit;

proc sql;
  select id from hav1st where id not in (select id from hav2nd);
  %put &=sqlobs hav1st except hav2nd;
  select id from hav2nd where id not in (select id from hav1st);
  %put &=sqlobs hav2nd except hav1st;
  select id from hav1st where id in (select id from hav2nd);
  %put &=sqlobs hav2nd intersect hav1st;
quit;

Valued Guide
Posts: 797

Re: Matching observations from one data to see if any are missing in another dataset

This can also be a good task for beginning to use hash objects:

 

data first_not_second;
  set first;
  if _n_=1 then do;
    declare hash second_ids (dataset:'second(keep=id)');
      second_ids.definekey(all:'Y');
      second_ids.definedone();
  end;
  if second_ids.find()^=0;
run;
Ask a Question
Discussion stats
  • 5 replies
  • 166 views
  • 0 likes
  • 6 in conversation