Hi All
I need to check that all the records in a earlier extracted dataset appear identically in the later extracted larger dataset.
The first dataset has a unique recordid for each obs. I need to check that these recordids are in the second larger extract (of the same data) and then compare each variable value to see if any of the data values have changed.
The second dataset is a later snapshot of the first and recordids could be missing, new ones added or data values changed. I just want to find out if my original data is missing or changed in anyway in the second dataset.
Each obs has a unique recordid
Any suggestions on how I can achieve this please?
Many thanks
Steve
Create a second data set that only has data that is in the first set and use that to compare to your first set.
*Get a subset of your second data set;
proc sql;
create table second_temp as
select * from second
where unique_id in (select unique_id from first)
order by unique_id;
quit;
proc compare base=first compare=second_temp;
id unique_id;
run;
Create a second data set that only has data that is in the first set and use that to compare to your first set.
*Get a subset of your second data set;
proc sql;
create table second_temp as
select * from second
where unique_id in (select unique_id from first)
order by unique_id;
quit;
proc compare base=first compare=second_temp;
id unique_id;
run;
Actually that is what PROC COMPARE will do. It will also tell you how many were added or deleted.
If you want to get the added or deleted records use PROC SQL query as Reeza has posted.
Or you can use simle data step merges.
data additions;
merge new old(in=inold keep=id);
by id;
if not inold;
run;
data deletions;
merge old new(in=innew keep=id);
by id;
if not innew;
run;
I prefer to Hash Table.
Ksharp
Thanks Guys
I've gone with Reezer's suggestion.
The problem I am getting is that sometimes a recordid has been inactivated during the time the second dataset has been created. So when I run the sql statement the output dataset has less records than the first dataset. Proc compare then fails as the obs are not aligned in both datasets.
Is there a way I can get proc compare to check by recordid and not postion in dataset?
Thanks
Steve
Use an ID statement.
Thanks Tom
Reeza did have that in his original reply....for some reason i thought this would still throw an error. Tried it and it works but is there a way i can get a list of the records which where not found in the compare dataset? I see the output says how many but not which ones?
Thanks
Steve
LISTBASEOBS seems to be the one. Is that correct?
I usually just put in LISTALL as I can never remember all of the different options.
This will list inserted and deleted rows and also variables that appear in only one of the two datasets.
I am sure this would help.
Proc compare compares each column from table 1 to table 2 in a row. For example if there is a rundate in table1 and rundate in table2 need not be same even you started to run the code same time to get table1 and table2.
For comparing two tables I am sure this is going to be usefulcode.
/* first outer join table1 and table2*/
Proc SQL;
Create table table1andtable2outerjoin as
Select a.id as id_1,
a.name as name_1,
b.id as id_2,
b.name as name_2
From table1 as a
Full Outer Join
table2 as b
on a.id = b.id and
a.name=b.name;
Quit;
/*to see rows only in table2 not in table1 you can add more columns for comparison*/
Data tablenew;
Source = 'two';
Length Source $5;
Set table1andtable2outerjoin ;
if missing(id_1) and missing(name_1)
then output;
Format Source $5.;
Informat Source $5.;
run;
/*to see rows only in table1not in table2 you can add more columns for comparison*/
Data tableold;
Source = 'one';
Length Source $5;
Set table1andtable2outerjoin;
if missing(id_2) and missing(name_2) then output;
Format Source $5.;
Informat Source $5.;
run;
/*to see rows side by side*/
Data tablenewold;
Set tablenew tableold;
run;
Hi Slolay
I agree with Reeza suggestion. It'll achieve what you need.
Augusto.
Consider the EXCEPT operator in PROC SQL.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.