Help using Base SAS procedures

Compare two data sets

Accepted Solution Solved
Reply
Regular Contributor
Posts: 216
Accepted Solution

Compare two data sets

Hi

I have two data sets that come from the same code, but run at two different points in time. One of the variables, Jur2006, has more 2000 observations marked as 'true' in the prior run than the current run. I would expect a few hundred different, but 2000 is too many.

To see what the difference is relative to the observations, I would like to compare the two data sets and identify the observations marked as 'true' in the prior run, but no longer marked as 'true' in the current run and output those observations in a 3rd data set. Is there a way to do this with PROC COMPARE or using a simple merge (which I am trying)?

Paul


Accepted Solutions
Solution
‎02-19-2014 03:25 PM
Super User
Posts: 10,532

Re: Compare two data sets

Sounds like a job for Proc SQL.

Proc Sql;

     Create table mismatch as

     select a.*

     from ( select * from FirstDataSet where Jur2006='True') as a

             natural join

            (select * from SecondDataSet where Jur2006='False') as a

     where a.Jur2006 ne b.Jur2006;

quit;

You'll need to change the names of the datasets and how the variable is indicated to be true or false. The output dataset barring ill fortune should have the records from the first run where the value changed.

WARNING: if enough values are repeated such that what might be considered a combination of identification variables have the same pattern for multiple records you're going to have some fun.

View solution in original post


All Replies
Solution
‎02-19-2014 03:25 PM
Super User
Posts: 10,532

Re: Compare two data sets

Sounds like a job for Proc SQL.

Proc Sql;

     Create table mismatch as

     select a.*

     from ( select * from FirstDataSet where Jur2006='True') as a

             natural join

            (select * from SecondDataSet where Jur2006='False') as a

     where a.Jur2006 ne b.Jur2006;

quit;

You'll need to change the names of the datasets and how the variable is indicated to be true or false. The output dataset barring ill fortune should have the records from the first run where the value changed.

WARNING: if enough values are repeated such that what might be considered a combination of identification variables have the same pattern for multiple records you're going to have some fun.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 171 views
  • 0 likes
  • 2 in conversation