BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
slolay
Fluorite | Level 6

Hi All

I need to check that all the records in a earlier extracted dataset appear identically in the later extracted larger dataset.

The first dataset has a unique recordid for each obs.  I  need to check that these recordids are in the second larger extract (of the same data) and then compare each variable value to see if any of the data values have changed.

The second dataset is a later snapshot of the first and recordids could be missing, new ones added or data values changed. I just want to find out if my original data is missing or changed in anyway in the second dataset.

Each obs has a unique recordid

Any suggestions on how I can achieve this please?

Many thanks

Steve

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Create a second data set that only has data that is in the first set and use that to compare to your first set.

*Get a subset of your second data set;

proc sql;

create table second_temp as

select * from second

where unique_id in (select unique_id from first)

order by unique_id;

quit;

proc compare base=first compare=second_temp;

id unique_id;

run;

View solution in original post

11 REPLIES 11
Reeza
Super User

Create a second data set that only has data that is in the first set and use that to compare to your first set.

*Get a subset of your second data set;

proc sql;

create table second_temp as

select * from second

where unique_id in (select unique_id from first)

order by unique_id;

quit;

proc compare base=first compare=second_temp;

id unique_id;

run;

Tom
Super User Tom
Super User

Actually that is what PROC COMPARE will do.  It will also tell you how many were added or deleted.

If you want to get the added or deleted records use PROC SQL query as Reeza has posted.

Or you can use simle data step merges.

data additions;

  merge new old(in=inold keep=id);

   by id;

   if not inold;

run;

data deletions;

   merge old new(in=innew keep=id);

   by id;

   if not innew;

run;

Ksharp
Super User

I prefer to Hash Table.

Ksharp

slolay
Fluorite | Level 6

Thanks Guys

I've gone with Reezer's suggestion.

The problem I am getting is that sometimes a recordid has been inactivated during the time the second dataset has been created. So when I run the sql statement the output dataset has less records than the first dataset. Proc compare then fails as the obs are not aligned in both datasets.

Is there a way I can get proc compare to check by recordid and not postion in dataset?

Thanks
Steve

Tom
Super User Tom
Super User

Use an ID statement.

slolay
Fluorite | Level 6

Thanks Tom

Reeza did have that in his original reply....for some reason i thought this would still throw an error. Tried it and it works but is there a way i can get a list of the records which where not found in the compare dataset? I see the output says how many but not which ones?

Thanks

Steve

slolay
Fluorite | Level 6

LISTBASEOBS seems to be the one.  Is that correct?

Tom
Super User Tom
Super User

I usually just put in LISTALL as I can never remember all of the different options.

This will list inserted and deleted rows and also variables that appear in only one of the two datasets.

sassharp
Calcite | Level 5

I am sure this would help.

Proc compare compares each column from table 1 to table 2 in a row. For example if there is a rundate in table1 and rundate in table2 need not be same even you started to run the code same time to get table1 and table2.

For comparing two tables I am sure this is going to be usefulcode.

/* first outer join table1 and table2*/

Proc SQL;
Create table table1andtable2outerjoin as
Select a.id as id_1,
a.name as name_1,

b.id as id_2,
b.name as name_2

From table1 as a
  Full Outer Join
  table2 as b
  on a.id = b.id and
     a.name=b.name;
  Quit;

/*to see rows only in table2 not in table1 you can add more columns for comparison*/

  Data tablenew;
  Source = 'two';
  Length Source $5;
  Set table1andtable2outerjoin ;
  if missing(id_1) and missing(name_1)

then output;

  Format Source $5.;
  Informat Source $5.;
       run;

/*to see rows only in table1not in table2 you can add more columns for comparison*/


Data tableold;
Source = 'one';
  Length Source $5;
  Set table1andtable2outerjoin;
  if missing(id_2) and missing(name_2)  then output;
 
  Format Source $5.;
  Informat Source $5.;
  run;

/*to see rows side by side*/

  Data tablenewold;
  Set tablenew tableold;
  run;

Augusto
Obsidian | Level 7

Hi Slolay

I agree with Reeza suggestion. It'll achieve what you need.

Augusto.

Howles
Quartz | Level 8

Consider the EXCEPT operator in PROC SQL.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1623 views
  • 0 likes
  • 7 in conversation