Hi,
I need to find differences between two datasets. I need to leave out one variable but need it from dataset2 in the dataset created..
So find the difference between rows in all columns except one column that will always be different and then keep that column in the created table with the data from one of the compared tables. The column I dont want to compare contains the date the dataset was created and I need to keep the date from one of the datasets.
This is the code I have that works fine when comparing all columns. This code is applied on many different tables with different amount of columns. The column I want to ignore in the compare but keep in the result will always have the same name, DATE:
data data_today; input employee $ salary $ date $; datalines; Smith 40000 20220419 Cool 00004 20220419
Me 000055 20220419 ; run; data data_yesterday; input employee $ salary $ date $; datalines; Smith 40000 20220418 Cool 00003 20220418
Me 000055 20220419 ; run; proc sql; create table Help as select * from data_today except select * from data_yesterday; quit;
Result wanted:
Cool 00004 20220419
The result should only contain 'Cool' because she has a different salary. Smith or Me should not be included because it's only the date that is different:
This is applied on many different tables so I have to use select * , but the date column will always have the same name.
Thanks!
Use the DROP dataset option to exclude the dates and then a NATURAL JOIN to bring the date back from data_today. This works for any number of variables, as long as you don't have duplicates in data_today.
proc sql;
create table Help as
select * from
(select * from data_today(drop=date)
except
select * from data_yesterday(drop=date))
natural join data_today;
quit;
Is this what you're looking for?
proc sql;
create table want as
select t.*
from data_today t
left join data_yesterday y
on t.employee = y.employee
and t.salary = y.salary
where y.employee is null
;
quit;
Also, are all the dates in your data_yesterday table all supposed to be 20220418? I presume so; however, your provided data has one row with 20220419.
Use the DROP dataset option to exclude the dates and then a NATURAL JOIN to bring the date back from data_today. This works for any number of variables, as long as you don't have duplicates in data_today.
proc sql;
create table Help as
select * from
(select * from data_today(drop=date)
except
select * from data_yesterday(drop=date))
natural join data_today;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.