Solved: UNION in Proc SQL returned as UNION ALL

Report Inappropriate Content · Posted 08-14-2019 06:28 PM

I performed a UNION on two data sets using Proc SQL. Instead of removing duplicates, it performed a UNION ALL and kept duplicates. The Proc SQL documentation states that UNION should only return distinct values. Why isn't it working as documented?

proc sql;
create table full_customers as 
select * from spring_customers
UNION
select * from fall_customers;
QUIT;

spring_customers and fall_customers are just tables I got from previous data steps.

data spring_customers;
set spring;
run;

data fall_customers;
set fall;
run;

ballardw · Posted 08-14-2019 06:40 PM

Are there variables in the data sets that would have different values, such as may be a date of service? That would make the records different. You may need to subset variables from each table depending on what you want.

Consider:

data work.example1;
   id = 3;
   date='01Jan2019'd;
run;

data work.example2;
   id = 3;
   date='01Aug2019'd;
run;


proc sql;
   create table work.full as
   select * from work.example1
   union
   select * from work.example2
   ;
quit;

proc sql;
   create table work.id as
   select id from work.example1
   union
   select id from work.example2
   ;
quit;

The second proc sql only returns one record because the only variables associated with the Union are the ID values, which are duplicated. The first returns both records because the value of the DATE variable varies between the two records and using the Select * says to consider all the variables.

You may also have to consider the CORR option if the order of variables varies between the two:

data work.example3;
   date='01Aug2019'd;
   id = 3;
run;

proc sql;
   create table work.full2 as
   select * from work.example2
   union
   select * from work.example3
   ;
quit;
proc sql;
   create table work.full3 as
   select * from work.example2
   union corr
   select * from work.example3
   ;
quit;

View solution in original post

ballardw · Posted 08-14-2019 06:40 PM

Are there variables in the data sets that would have different values, such as may be a date of service? That would make the records different. You may need to subset variables from each table depending on what you want.

Consider:

data work.example1;
   id = 3;
   date='01Jan2019'd;
run;

data work.example2;
   id = 3;
   date='01Aug2019'd;
run;


proc sql;
   create table work.full as
   select * from work.example1
   union
   select * from work.example2
   ;
quit;

proc sql;
   create table work.id as
   select id from work.example1
   union
   select id from work.example2
   ;
quit;

The second proc sql only returns one record because the only variables associated with the Union are the ID values, which are duplicated. The first returns both records because the value of the DATE variable varies between the two records and using the Select * says to consider all the variables.

You may also have to consider the CORR option if the order of variables varies between the two:

data work.example3;
   date='01Aug2019'd;
   id = 3;
run;

proc sql;
   create table work.full2 as
   select * from work.example2
   union
   select * from work.example3
   ;
quit;
proc sql;
   create table work.full3 as
   select * from work.example2
   union corr
   select * from work.example3
   ;
quit;

UNION in Proc SQL returned as UNION ALL

Re: UNION in Proc SQL returned as UNION ALL

Re: UNION in Proc SQL returned as UNION ALL

UNION in Proc SQL returned as UNION ALL

Re: UNION in Proc SQL returned as UNION ALL

Re: UNION in Proc SQL returned as UNION ALL

Click image to register for webinar

Classroom Training Available!