Desktop productivity for business analysts and programmers

COMPARE TWO DATA SETS

Accepted Solution Solved
Reply
Contributor
Posts: 52
Accepted Solution

COMPARE TWO DATA SETS

/*

I NEED TO COMPARE TWO OUTPUTS BASED ON CONTRACT AND PLAN FROM YEAR TO YEAR.

IF THERE'S A CHANGE SHOW THE DIFFERENCE IN THE LAST COLUMN.

IF THERE'S AN ADDITION SHOW THE NEWLY ADDED COLUMN AND THE DIFFERENCE (LOOK AT PLAN 035).

IF THERE'S A DELETION SHOW THE REMOVAL AND THE DIFFERENCE (LOOK AT PLAN 801).

*/

DATA YEAR2012;

     INPUT CONTRACT $ PLAN $ STATE $ COUNTY $ ENROLLMENT 12.;

     FORMAT ENROLLMENT 12.;

     DATALINES;

H9999 801 NY SUFFOLK 90

H9999 810 NY SUFFOLK 0

H9999 808 NY SUFFOLK 0

H9999 032 NY SUFFOLK 614

H9999 814 NY SUFFOLK 1324

H9999 021 NY SUFFOLK 1211

H9999 029 NY SUFFOLK 92

H9999 033 NY SUFFOLK 396

H9999 815 NY SUFFOLK 81

;

RUN;

DATA YEAR2013;

     INPUT CONTRACT $ PLAN $ STATE $ COUNTY $ ENROLLMENT 12.;

     FORMAT ENROLLMENT 12.;

     DATALINES;

H9999 810 NY SUFFOLK 50

H9999 808 NY SUFFOLK 1000

H9999 032 NY SUFFOLK 600

H9999 814 NY SUFFOLK 1300

H9999 021 NY SUFFOLK 1355

H9999 029 NY SUFFOLK 192

H9999 033 NY SUFFOLK 400

H9999 815 NY SUFFOLK 90

H9999 035 NY SUFFOLK 500

;

RUN;

/* RESULTS i'M LOOKING FOR

YEAR 2012                                      YEAR 2013

CONTRACT PLAN  STATE  COUNTY    ENROLLMENT     CONTRACT  PLAN   STATE   COUNTY    ENROLLMENT   DIFFERENCE

H9999    801   NY     SUFFOLK   90                                                             -90

H9999    810   NY     SUFFOLK   0              H9999     810    NY      SUFFOLK   50           50

H9999    808   NY     SUFFOLK   0              H9999     808    NY      SUFFOLK   1000         1000

H9999    032   NY     SUFFOLK   614            H9999     032    NY      SUFFOLK   600          -14

H9999    814   NY     SUFFOLK   1324           H9999     814    NY      SUFFOLK   1300         -24

H9999    021   NY     SUFFOLK   1211           H9999     021    NY      SUFFOLK   1355         144

H9999    029   NY     SUFFOLK   92             H9999     029    NY      SUFFOLK   192          100

H9999    033   NY     SUFFOLK   396            H9999     033    NY      SUFFOLK   400          4

H9999    815   NY     SUFFOLK   81             H9999     815    NY      SUFFOLK   90           9

                                               H9999     035    NY      SUFFOLK   500          500 

I NEED TO COMPARE TWO OUTPUTS BASED ON CONTRACT AND PLAN FROM YEAR TO YEAR.

IF THERE'S A CHANGE SHOW THE DIFFERENCE IN THE LAST COLUMN.

IF THERE'S AN ADDITION SHOW THE NEWLY ADDED COLUMN AND THE DIFFERENCE (LOOK AT PLAN 035).

IF THERE'S A DELETION SHOW THE REMOVAL AND THE DIFFERENCE (LOOK AT PLAN 801).

Your help is greatly appreciated on this one...

Signed New Jack  :smileyconfused:


Accepted Solutions
Solution
‎12-31-2013 04:30 PM
Trusted Advisor
Posts: 1,056

Re: COMPARE TWO DATA SETS

Hi, New Jack and Happy New Year

To make it a little clearer, I've renamed your columns, tagging them by the year so that in YEAR2012 CONTRACT is named Y12_CONTRACT, PLAN is Y12_PLAN, etc, and Y13_CONTRACT, Y13_PLAN, etc. in YEAR2013.

Then:

1. Create a new Query Builder task on YEAR2012. Add all of the columns to your "Select Data" tab.

2. Click on "Add Tables", and add YEAR2013 to the query. It will have trouble with the join; on the Join window, connect Y12_CONTRACT to Y13_CONTRACT, and Y12_PLAN to Y13_PLAN. Now, VERY IMPORTANT, change both joins to "Full Outer Joins".

3. Close the Join window, and add all of the columns from YEAR2012 to your "Select Data" tab

4. Run the query. You should see your rows laid out as you describe. The "DIFFERENCE" is just math between Y12_ENROLLMENT and Y13_ENROLLMENT.

Tom

View solution in original post


All Replies
Solution
‎12-31-2013 04:30 PM
Trusted Advisor
Posts: 1,056

Re: COMPARE TWO DATA SETS

Hi, New Jack and Happy New Year

To make it a little clearer, I've renamed your columns, tagging them by the year so that in YEAR2012 CONTRACT is named Y12_CONTRACT, PLAN is Y12_PLAN, etc, and Y13_CONTRACT, Y13_PLAN, etc. in YEAR2013.

Then:

1. Create a new Query Builder task on YEAR2012. Add all of the columns to your "Select Data" tab.

2. Click on "Add Tables", and add YEAR2013 to the query. It will have trouble with the join; on the Join window, connect Y12_CONTRACT to Y13_CONTRACT, and Y12_PLAN to Y13_PLAN. Now, VERY IMPORTANT, change both joins to "Full Outer Joins".

3. Close the Join window, and add all of the columns from YEAR2012 to your "Select Data" tab

4. Run the query. You should see your rows laid out as you describe. The "DIFFERENCE" is just math between Y12_ENROLLMENT and Y13_ENROLLMENT.

Tom

Contributor
Posts: 52

Re: COMPARE TWO DATA SETS

I'll try it again

Happy New Year...

the last time I ran the same pull there was redundant data...

Trusted Advisor
Posts: 1,056

Re: COMPARE TWO DATA SETS

This will only work if you don't have any duplicates!

Tom

Contributor
Posts: 52

Re: COMPARE TWO DATA SETS

That's the problem - each table has matching valueswith the exception of the addition of a new plan or the numbers increase or decrease

Super User
Posts: 1,111

Re: COMPARE TWO DATA SETS

Hi,

Happy New year.

Please try the below code, it is producing your desired output.

DATA YEAR2012;

  INPUT CONTRACT $ PLAN $ STATE $ COUNTY $ ENROLLMENT 12.;

  FORMAT ENROLLMENT 12.;

  DATALINES;

H9999 801 NY SUFFOLK 90

H9999 810 NY SUFFOLK 0

H9999 808 NY SUFFOLK 0

H9999 032 NY SUFFOLK 614

H9999 814 NY SUFFOLK 1324

H9999 021 NY SUFFOLK 1211

H9999 029 NY SUFFOLK 92

H9999 033 NY SUFFOLK 396

H9999 815 NY SUFFOLK 81

;

RUN;

proc sort data=YEAR2012(rename=(ENROLLMENT=yr12_ENROLLMENT CONTRACT=yr12_CONTRACT));

  by PLAN STATE COUNTY;

run;

DATA YEAR2013;

  INPUT CONTRACT $ PLAN $ STATE $ COUNTY $ ENROLLMENT 12.;

  FORMAT ENROLLMENT 12.;

  DATALINES;

H9999 810 NY SUFFOLK 50

H9999 808 NY SUFFOLK 1000

H9999 032 NY SUFFOLK 600

H9999 814 NY SUFFOLK 1300

H9999 021 NY SUFFOLK 1355

H9999 029 NY SUFFOLK 192

H9999 033 NY SUFFOLK 400

H9999 815 NY SUFFOLK 90

H9999 035 NY SUFFOLK 500

;

RUN;

proc sort data=YEAR2013(rename=(ENROLLMENT=yr13_ENROLLMENT CONTRACT=yr13_CONTRACT));

  by PLAN STATE COUNTY;

run;

data yr12_13;

  merge YEAR2012(in=a) YEAR2013(in=b);

  by PLAN STATE COUNTY;

  if yr13_ENROLLMENT ne . and yr12_ENROLLMENT ne . then DIFFERENCE=yr13_ENROLLMENT-yr12_ENROLLMENT;

  if yr13_ENROLLMENT eq . and yr12_ENROLLMENT ne . then DIFFERENCE=0-yr12_ENROLLMENT;

  if yr13_ENROLLMENT ne . and yr12_ENROLLMENT eq . then DIFFERENCE=yr13_ENROLLMENT-0;

run;

alternatively by proc sql

PROC SQL;

   CREATE TABLE yr12_13 AS

   SELECT t1.yr12_CONTRACT,

          t1.PLAN,

          t1.STATE,

          t1.COUNTY,

          t1.yr12_ENROLLMENT,

          t2.yr13_CONTRACT,

          t2.PLAN AS PLAN1,

          t2.STATE AS STATE1,

          t2.COUNTY AS COUNTY1,

          t2.yr13_ENROLLMENT,

          /* DIFFERENCE */

            (coalesce(t2.yr13_ENROLLMENT,0)-coalesce(t1.yr12_ENROLLMENT,0)) AS DIFFERENCE

      FROM WORK.YEAR2012 t1

           FULL JOIN WORK.YEAR2013 t2 ON (t1.PLAN = t2.PLAN) AND (t1.STATE = t2.STATE) AND (t1.COUNTY = t2.COUNTY);

QUIT;

Thanks,

Jag

Thanks,
Jag
Trusted Advisor
Posts: 1,056

Re: COMPARE TWO DATA SETS

No, what I mean is that for each value of contract and plan, it must be unique. So for example, in your data, you have contract / plan "H9999 810" in both 2012 and 2013. That's fine, but if you had another record in either file with exactly those values, it won't work.

To me, intuitively, it would make no sense for you to have duplicates, so I assumed you didn't.

Tom

Contributor
Posts: 52

Re: COMPARE TWO DATA SETS

I'll log on later and check it out.

Perhaps my mistake was not linking contract and plan as a full outer join.

Thanks again...

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 341 views
  • 6 likes
  • 3 in conversation