BookmarkSubscribeRSS Feed
jackmelbourne
Fluorite | Level 6

Hello everyone,

 

I have two datasets and I need to see if observations from one dataset are missing from another dataset. For example,

 

data have;
   input Name :$upcase4. id Net Reg Loss;
   cards;home 344 200  50    .
home 543  80   .    5
TDR  453  75   .    3
TDR  543   .   5    .
;
;;; run;
data have;
input Name :$upcase4. id Net Reg Loss;
cards;
home
344 200 50 . home 543 80 . 5
;
;;;
run;

 I want to see if any ids are missing in the second dataset compared with the first dataset. For example, id 453 is missing but 543 and 344 are not.

 

Thanks in advance 🙂

 

Kind regards,

Jack

5 REPLIES 5
AndrewHowell
Moderator

Assuming the tables are called First & Second:

 

 

proc sql;
  /* Get IDs in First but not in Second */
  select id from First except select id from Second;
  /* Get IDs in Second but not in First */
  select id from Second except select id from First;
  /* Get IDs in both */
  select id from Second intersect select id from First;
quit;

Does this help?

ballardw
Super User

@AndrewHowell wrote:

Assuming the tables are called First & Second:

 

 

proc sql;
  /* Get IDs in First but not in Second */
  select id from First except select id from Second;
  /* Get IDs in Second but not in First */
  select id from Second except select id from First;
  /* Get IDs in both */
  select id from Second intersect select id from First;
quit;

Does this help?


May want select distinct id if there are duplicates.

Kurt_Bremser
Super User
proc sort data=have1;
by id;
run;

proc sort data=have2;
by id;
run;

data
  miss_from_have2
  miss_from_have1
  both
;
merge
  have1 (in=a keep=id)
  have2 (in=b keep=id)
;
by id;
if a and b
then output both;
else if a
then output miss_from_have2;
else output miss_from_have1;
run;

might perform better on large datasets.

rogerjdeangelis
Barite | Level 11
Elegant SQL intersection and exception of sets of data

Matching observations from one data to see if any are missing in another dataset

see
https://goo.gl/xCjIsU
https://communities.sas.com/t5/Base-SAS-Programming/Matching-observations-from-one-data-to-see-if-any-are-missing-in/m-p/343684

HAVE
=====

Up to 40 obs WORK.HAV1ST total obs=2

Obs    NAME    ID

 1     home    ___
 2     home    XXX


Up to 40 obs WORK.HAV2ND total obs=4

Obs    NAME    ID

 1     home    ___
 2     home    XXX
 3     TDR     @@@
 4     TDR     XXX

WANT
=====

   SQLOBS=0 hav1st except hav2nd

   SQLOBS=1 hav2nd except hav1st

   ID
   --------
   @@@

   SQLOBS=2 hav2nd intersect hav1st

   ID
   --------
   XXX
   ___

FULL SOLUTION
===============


%utlnopts; /* turn options off so so only the puts statements are in the log */
proc sql;

  /* ID @@@ Get IDs in hav1st but not in hav2nd */
  select id from hav1st except select id from hav2nd;
  %put &=sqlobs hav1st except hav2nd;

  /* Get IDs in hav2nd but not in hav1st */
  select id from hav2nd except select id from hav1st;
  %put &=sqlobs hav2nd except hav1st;

  /* Get IDs in both */
  select id from hav2nd intersect select id from hav1st;
  %put &=sqlobs hav2nd intersect hav1st;

quit;

proc sql;
  select id from hav1st where id not in (select id from hav2nd);
  %put &=sqlobs hav1st except hav2nd;
  select id from hav2nd where id not in (select id from hav1st);
  %put &=sqlobs hav2nd except hav1st;
  select id from hav1st where id in (select id from hav2nd);
  %put &=sqlobs hav2nd intersect hav1st;
quit;

mkeintz
PROC Star

This can also be a good task for beginning to use hash objects:

 

data first_not_second;
  set first;
  if _n_=1 then do;
    declare hash second_ids (dataset:'second(keep=id)');
      second_ids.definekey(all:'Y');
      second_ids.definedone();
  end;
  if second_ids.find()^=0;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1271 views
  • 0 likes
  • 6 in conversation