BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I performed a UNION on two data sets using Proc SQL. Instead of removing duplicates, it performed a UNION ALL and kept duplicates. The Proc SQL documentation states that UNION should only return distinct values. Why isn't it working as documented?

 

 

proc sql;
create table full_customers as 
select * from spring_customers
UNION
select * from fall_customers;
QUIT;

spring_customers and fall_customers are just tables I got from previous data steps.

data spring_customers;
set spring;
run;

data fall_customers;
set fall;
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Are there variables in the data sets that would have different values, such as may be a date of service? That would make the records different. You may need to subset variables from each table depending on what you want.

 

Consider:

data work.example1;
   id = 3;
   date='01Jan2019'd;
run;

data work.example2;
   id = 3;
   date='01Aug2019'd;
run;


proc sql;
   create table work.full as
   select * from work.example1
   union
   select * from work.example2
   ;
quit;

proc sql;
   create table work.id as
   select id from work.example1
   union
   select id from work.example2
   ;
quit;

The second proc sql only returns one record because the only variables associated with the Union are the ID values, which are duplicated. The first returns both records because the value of the DATE variable varies between the two records and using the Select * says to consider all the variables.

 

You may also have to consider the CORR option if the order of variables varies between the two:

data work.example3;
   date='01Aug2019'd;
   id = 3;
run;

proc sql;
   create table work.full2 as
   select * from work.example2
   union
   select * from work.example3
   ;
quit;
proc sql;
   create table work.full3 as
   select * from work.example2
   union corr
   select * from work.example3
   ;
quit;

View solution in original post

1 REPLY 1
ballardw
Super User

Are there variables in the data sets that would have different values, such as may be a date of service? That would make the records different. You may need to subset variables from each table depending on what you want.

 

Consider:

data work.example1;
   id = 3;
   date='01Jan2019'd;
run;

data work.example2;
   id = 3;
   date='01Aug2019'd;
run;


proc sql;
   create table work.full as
   select * from work.example1
   union
   select * from work.example2
   ;
quit;

proc sql;
   create table work.id as
   select id from work.example1
   union
   select id from work.example2
   ;
quit;

The second proc sql only returns one record because the only variables associated with the Union are the ID values, which are duplicated. The first returns both records because the value of the DATE variable varies between the two records and using the Select * says to consider all the variables.

 

You may also have to consider the CORR option if the order of variables varies between the two:

data work.example3;
   date='01Aug2019'd;
   id = 3;
run;

proc sql;
   create table work.full2 as
   select * from work.example2
   union
   select * from work.example3
   ;
quit;
proc sql;
   create table work.full3 as
   select * from work.example2
   union corr
   select * from work.example3
   ;
quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 8983 views
  • 0 likes
  • 2 in conversation