Help using Base SAS procedures

Proc Compare Datasets Which Increase In Size

Accepted Solution Solved
Reply
Contributor
Posts: 51
Accepted Solution

Proc Compare Datasets Which Increase In Size

Hi All

I need to check that all the records in a earlier extracted dataset appear identically in the later extracted larger dataset.

The first dataset has a unique recordid for each obs.  I  need to check that these recordids are in the second larger extract (of the same data) and then compare each variable value to see if any of the data values have changed.

The second dataset is a later snapshot of the first and recordids could be missing, new ones added or data values changed. I just want to find out if my original data is missing or changed in anyway in the second dataset.

Each obs has a unique recordid

Any suggestions on how I can achieve this please?

Many thanks

Steve


Accepted Solutions
Solution
‎03-08-2012 01:50 PM
Super User
Posts: 17,819

Re: Proc Compare Datasets Which Increase In Size

Create a second data set that only has data that is in the first set and use that to compare to your first set.

*Get a subset of your second data set;

proc sql;

create table second_temp as

select * from second

where unique_id in (select unique_id from first)

order by unique_id;

quit;

proc compare base=first compare=second_temp;

id unique_id;

run;

View solution in original post


All Replies
Solution
‎03-08-2012 01:50 PM
Super User
Posts: 17,819

Re: Proc Compare Datasets Which Increase In Size

Create a second data set that only has data that is in the first set and use that to compare to your first set.

*Get a subset of your second data set;

proc sql;

create table second_temp as

select * from second

where unique_id in (select unique_id from first)

order by unique_id;

quit;

proc compare base=first compare=second_temp;

id unique_id;

run;

Super User
Super User
Posts: 6,500

Re: Proc Compare Datasets Which Increase In Size

Actually that is what PROC COMPARE will do.  It will also tell you how many were added or deleted.

If you want to get the added or deleted records use PROC SQL query as Reeza has posted.

Or you can use simle data step merges.

data additions;

  merge new old(in=inold keep=id);

   by id;

   if not inold;

run;

data deletions;

   merge old new(in=innew keep=id);

   by id;

   if not innew;

run;

Super User
Posts: 9,676

Re: Proc Compare Datasets Which Increase In Size

I prefer to Hash Table.

Ksharp

Contributor
Posts: 51

Proc Compare Datasets Which Increase In Size

Thanks Guys

I've gone with Reezer's suggestion.

The problem I am getting is that sometimes a recordid has been inactivated during the time the second dataset has been created. So when I run the sql statement the output dataset has less records than the first dataset. Proc compare then fails as the obs are not aligned in both datasets.

Is there a way I can get proc compare to check by recordid and not postion in dataset?

Thanks
Steve

Super User
Super User
Posts: 6,500

Proc Compare Datasets Which Increase In Size

Use an ID statement.

Contributor
Posts: 51

Proc Compare Datasets Which Increase In Size

Thanks Tom

Reeza did have that in his original reply....for some reason i thought this would still throw an error. Tried it and it works but is there a way i can get a list of the records which where not found in the compare dataset? I see the output says how many but not which ones?

Thanks

Steve

Contributor
Posts: 51

Proc Compare Datasets Which Increase In Size

LISTBASEOBS seems to be the one.  Is that correct?

Super User
Super User
Posts: 6,500

Proc Compare Datasets Which Increase In Size

I usually just put in LISTALL as I can never remember all of the different options.

This will list inserted and deleted rows and also variables that appear in only one of the two datasets.

Frequent Contributor
Posts: 110

Proc Compare Datasets Which Increase In Size

I am sure this would help.

Proc compare compares each column from table 1 to table 2 in a row. For example if there is a rundate in table1 and rundate in table2 need not be same even you started to run the code same time to get table1 and table2.

For comparing two tables I am sure this is going to be usefulcode.

/* first outer join table1 and table2*/

Proc SQL;
Create table table1andtable2outerjoin as
Select a.id as id_1,
a.name as name_1,

b.id as id_2,
b.name as name_2

From table1 as a
  Full Outer Join
  table2 as b
  on a.id = b.id and
     a.name=b.name;
  Quit;

/*to see rows only in table2 not in table1 you can add more columns for comparison*/

  Data tablenew;
  Source = 'two';
  Length Source $5;
  Set table1andtable2outerjoin ;
  if missing(id_1) and missing(name_1)

then output;

  Format Source $5.;
  Informat Source $5.;
       run;

/*to see rows only in table1not in table2 you can add more columns for comparison*/


Data tableold;
Source = 'one';
  Length Source $5;
  Set table1andtable2outerjoin;
  if missing(id_2) and missing(name_2)  then output;
 
  Format Source $5.;
  Informat Source $5.;
  run;

/*to see rows side by side*/

  Data tablenewold;
  Set tablenew tableold;
  run;

Frequent Contributor
Posts: 80

Proc Compare Datasets Which Increase In Size

Hi Slolay

I agree with Reeza suggestion. It'll achieve what you need.

Augusto.

Regular Contributor
Posts: 184

Re: Proc Compare Datasets Which Increase In Size

Consider the EXCEPT operator in PROC SQL.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 659 views
  • 0 likes
  • 7 in conversation