DATA Step, Macro, Functions and more

Difference in columns

Reply
Super Contributor
Posts: 673

Difference in columns

There are two tables MBR_2010Q3 and MBR_2010Q4 they have the same data structre like Country,City,State,Earnings. Now How to get the difference between Q3 and Q4 earnings grouped on country,city,state.There are 5000 records in Q3 dataset and 9000 records in Q4.

PROC Star
Posts: 7,492

Difference in columns

You could always merge the two files together, changing the name of the earnings variable in one of the files, and then simply create a difference_variable and summarize it with proc means or proc summary.

Respected Advisor
Posts: 3,799

Re: Difference in columns

Do you difference then summarize or summarize then difference.

PROC COMPARE is an option you might consider.

data class0;

   set sashelp.class;

   array _n

  • _numeric_;

       do _n_ = 1 to dim(_n);

          _n[_n_] = _n[_n_] + rannor(12345);

          end;

       run;

    proc compare base=sashelp.class compare=class0 outdif out=dif;

       id name;

       var _numeric_;

       run;

    Proc print;

       run;

    Super User
    Posts: 10,046

    Re: Difference in columns

    Did you mean the difference of these two variables's value?

    You can use proc means to get the earning for each group,then merge them into a table to get what you want.

    Ksharp

    Valued Guide
    Posts: 2,177

    Re: Difference in columns

    as I feel out-of-work

    and this sounds like home-work

    here is some work done at home

    1. code to generate test data,
    2. summarise totals and averages for each quarter,
    3. a compare/calculate data step which prepares a data set for
    4. proc print

    results are attached (in html format from sas9.3)

    ************************************

    MBR_2010Q3 and MBR_2010Q4 

    have the same data structre like

    Country,City,State,Earnings.

    get the difference between Q3 and Q4 earnings

    grouped on country,city,state.

    ;

    *1 generate test data;

    data MBR_2010Q3 MBR_2010Q4;

       keep   Country City State Earnings;

       length country City $20

           ;

       do country = 'Here', 'There', 'some Where Else';

        ref1+1;

           do City = 'Hope', 'glory', 'industrial', 'services';

          ref2+1;

               do state= 353, 355;

                   do t=1 to 100;

                      *to bulk up the test data;

               ref3+1;

               earnings = ranuni(ref1+ref2+ref3)*1000;

                       if ranuni(1) >.1 then output MBR_2010Q3;

               earnings = ranuni(ref1+ref2+ref3*7)*1000;

                       if ranuni(1) <.9 then output MBR_2010Q4;

                  end;

              end;

          ref3=0;

          end;

       ref2=0;

       end;

       stop;

    run;

    *2 generate totals and averages;

    proc means noprint data= MBR_2010Q3 nway;

       class  Country City State;

       var    earnings;

       output sum= total_earnings3 mean=ave_earnings3

              out= q3_stats;

    run;

    *  for the other quarter too;

    proc means noprint data= MBR_2010Q4 nway;

       class  Country City State;

       var    earnings;

       output sum= total_earnings4 mean=ave_earnings4

              out= q4_stats;

    run;

    *3 compare (merge) to measure growth;

    data cf;

       merge q3_stats q4_stats;

       by  Country City State;

       if n( ave_earnings3, ave_earnings4 )=2 then

          do;

             ave_delta =   ave_earnings4 / ave_earnings3 -1;

             sum_delta = total_earnings4/total_earnings3 -1;

             growth    = total_earnings4-total_earnings3;

          end;

       format    ave_delta sum_delta percent8.2

                ave_earnings3 ave_earnings4            comma12.2

                total_earnings3 total_earnings4 growth comma12.;

       label     ave_earnings3     ='average earnings 2010Q3'

                ave_earnings4     ='average earnings 2010Q4'

                ave_delta         ='growth in average earnings'

                total_earnings3   ='aggregate earnings 2010Q3'

                total_earnings4   ='aggregate earnings 2010Q4'

                growth         ='growth of aggregate earnings 2010Q3 to Q4'

                sum_delta ='growth rate of aggregate earnings 2010Q3 to Q4'

       ;

    run;

    *4 report measures;

    proc print label;

       title      .h=5 'earnings growth';

       footnote .h=1 "demo by peterC at %sysfunc( datetime(),twmdy)";

       id    Country City State;

       var   ave_earnings3 ave_earnings4 ave_delta

            total_earnings3 total_earnings4

            growth sum_delta;

    run;

    title; footnote;

    Attachment
    Super Contributor
    Posts: 673

    Re: Difference in columns

    Hi Peter,

    Thanks for the reply. I'm wondering if this can be done in proc sql something like this:

      

    proc sql;
    create table diff as
      select a.*,b.earnings as earnings_q4, a.earnings-b.earnings as diff
       from Mbr_2010q3 as a full join Mbr_2010q4 as b
       on a.country=b.country and
       a.city=b.city and
       a.state=b.state;

    quit;

    PROC Star
    Posts: 7,492

    Re: Difference in columns

    I didn't test your code but it in essence is doing what all of the suggestions have been: merge the files and obtain the difference between the earnings.  Except that you are just creating the merged file.  Why stop there?  You could easily add the summary function to get just one record containing the sums for each group (of course you would have to add in the distinct and group by clauses).

    Super Contributor
    Posts: 673

    Re: Difference in columns

    Art,

    It doesnt give me the expected result as the join results in 191637 records.

    PROC Star
    Posts: 7,492

    Re: Difference in columns

    Doesn't the following approximate what you are trying to do?

    /*create some test datasets*/

    data mbr_2010q3 (keep=country city state earnings);

      set sashelp.class (rename=(

        sex=country height=earnings));

      if substr(name,1,1) lt 'N' then city=1;

      else city=2;

      if age lt 15 then state=1;

      else state=2;

      earnings=earnings+weight;

    run;

    data mbr_2010q4 (keep=country city state earnings);

      set sashelp.class (rename=(

        sex=country height=earnings));

      if substr(name,1,1) lt 'N' then city=1;

      else city=2;

      if age lt 15 then state=1;

      else state=2;

    run;

    proc sql;

      create table diff as

        select a.country,a.city,

          sum(a.earnings)-sum(b.earnings) as diff

            from Mbr_2010q3 as a full join Mbr_2010q4 as b

              on a.country=b.country and

                   a.city=b.city and

                   a.state=b.state

                     group by a.country,a.city

      ;

    quit;

    Ask a Question
    Discussion stats
    • 8 replies
    • 263 views
    • 0 likes
    • 5 in conversation