BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ModernaMattias
Calcite | Level 5

Hi,

I need to find differences between two datasets. I need to leave out one variable but need it from dataset2 in the dataset created..

 

So find the difference between rows in all columns except one column that will always be different and then keep that column in the created table with the data from one of the compared tables. The column I dont want to compare contains the date the dataset was created and I need to keep the date from one of the datasets.

 

This is the code I have that works fine when comparing all columns. This code is applied on many different tables with different amount of columns. The column I want to ignore in the compare but keep in the result will always have the same name, DATE:

 

data data_today;
input employee $ salary $ date $;
datalines;
Smith 40000 20220419
Cool 00004 20220419
Me 000055 20220419 ; run; data data_yesterday; input employee $ salary $ date $; datalines; Smith 40000 20220418 Cool 00003 20220418
Me 000055 20220419 ; run; proc sql; create table Help as select * from data_today except select * from data_yesterday; quit;

Result wanted:
Cool 00004 20220419

The result should only contain 'Cool' because she has a different salary. Smith or Me should not be included because it's only the date that is different:

 

 

This is applied on many different tables so I have to use select * , but the date column will always have the same name.

 

Thanks!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Use the DROP dataset option to exclude the dates and then a NATURAL JOIN to bring the date back from data_today. This works for any number of variables, as long as you don't have duplicates in data_today.

proc sql;
create table Help as
select * from
(select * from data_today(drop=date)
 except
 select * from data_yesterday(drop=date))
natural join data_today;
quit; 

 

PG

View solution in original post

3 REPLIES 3
mklangley
Lapis Lazuli | Level 10

Is this what you're looking for?

proc sql;
    create table want as
    select t.*
    from data_today t
    left join data_yesterday y
        on t.employee = y.employee
        and t.salary = y.salary
    where y.employee is null
    ;
quit;

Also, are all the dates in your data_yesterday table all supposed to be 20220418? I presume so; however, your provided data has one row with 20220419.

PGStats
Opal | Level 21

Use the DROP dataset option to exclude the dates and then a NATURAL JOIN to bring the date back from data_today. This works for any number of variables, as long as you don't have duplicates in data_today.

proc sql;
create table Help as
select * from
(select * from data_today(drop=date)
 except
 select * from data_yesterday(drop=date))
natural join data_today;
quit; 

 

PG
ModernaMattias
Calcite | Level 5
Thanks for all your help! Really appreciate it!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2089 views
  • 1 like
  • 3 in conversation