Help using Base SAS procedures

Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

Accepted Solution Solved
Reply
Contributor
Posts: 68
Accepted Solution

Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

Dear all,

I would very much appreciate any help with the following issue.

I have a data set A that contains six vars.

DATE_QTR (e.g. YYYYQQ)   COMP    ID    Country   Region    Income

2005Q2                                  ABC    345     AUS      FER     5,000

2005Q2                                  ABC    345     AUS      VHR     3,000

2005Q3                                  ABC    345     AUS      FER     1,500

2005Q3                                  ABC    345     AUS      VHR     2,000

..................................................................................................

where the GROUP BY vars are:

DATE_QTR (e.g. YYYYQQ) 

COMP  

ID

Country 

Region   

and a second dataset  B  that contains the same Group By variables except the last where instead of INCOME there is TOTAL_REVENUE.

I would like to create in dataset B a RATIO = (SUM_INCOME) / (SUM_ TOTAL_REVENUE)  where

     SUM_INCOME is based on the vars    COMP,    ID,    Country,   Region 

&   SUM_ TOTAL_REVENUE is based only on the var COMP.

Now I would like for each row of data set A to attach RATIO and create a new variable  INC_ALLOCATE = INCOME * RATIO with the following restriction

both  SUM_INCOME & SUM_ TOTAL_REVENUE should be evaluated up to the value of DATE_QTR of the respective row in data set A.

For example for the first row of data set A  DATE_QTR has the value 2005Q2

I would like SUM_INCOME (dataset B)  to be the sum of all TOTAL_REVENUE UP to and including 2005Q2 (for the GROUP BY variables COMP,    ID,    Country,   Region  )

Likewise     SUM_ TOTAL_REVENUE  (dataset B) to be the sum of all TOTAL_REVENUE UP to and including 2005Q2 (for the GROUP BY variable COMP  )

Kind regards

Nikos


Accepted Solutions
Solution
‎01-06-2012 06:53 PM
PROC Star
Posts: 7,485

Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

For what you want I think that a series of datasteps should suffice.  However, there was at least one error in your example data (regarding quarter specified) and there may be an error in my code.  However, I think this comes close enough to give you some direction:

data a;

  informat date_qtr yyq6.;

  format date_qtr yyq6.;

  input DATE_QTR COMP $   ID    Country  $ Region  $  Income comma8.;

  cards;

2005Q2 ABC    345     AUS      FER   5,000

2005Q2 ABC    345     AUS      VHR   3,000

2005Q3 ABC    345     AUS      FER   1,500

2005Q3 ABC    345     AUS      VHR   2,000

2005Q4 ABC    345     AUS      FER   2,500

2005Q4 ABC    345     AUS      VHR   2,800

2006Q1 ABC    345     AUS      FER   1,750

2006Q1 ABC    345     AUS      VHR   2,220

;

data b;

  informat date_qtr yyq6.;

  format date_qtr yyq6.;

  input DATE_QTR COMP $   ID    Country  $ Region  $  Revenue comma8.;

  cards;

2004Q2 ABC    345     AUS      FER  5,000

2004Q2 ABC    345     AUS      VHR  5,000

2004Q3 ABC    345     AUS      FER  5,000

2004Q3 ABC    345     AUS      VHR  5,000

2004Q4 ABC    345     AUS      FER  5,000

2004Q4 ABC    345     AUS      VHR  5,000

2005Q1 ABC    345     AUS      FER  5,000

2005Q1 ABC    345     AUS      VHR  5,000

2005Q2 ABC    345     AUS      FER  5,000

2005Q2 ABC    345     AUS      VHR  5,000

2005Q3 ABC    345     AUS      FER  5,000

2005Q3 ABC    345     AUS      VHR  5,000

2005Q4 ABC    345     AUS      FER  5,000

2005Q4 ABC    345     AUS      VHR  5,000

2006Q1 ABC    345     AUS      FER  5,000

2006Q1 ABC    345     AUS      VHR  5,000

;

proc sort data=a;

  by COMP ID Country Region DATE_QTR;

run;

proc sort data=b;

  by COMP ID Country Region DATE_QTR;

run;

data want;

  merge a (in=ina) b (in=inb);

  by COMP ID Country Region DATE_QTR;

  if ina or inb;

  if ina then infilea=1;

run;

data want;

  set want;

  by COMP ID Country Region DATE_QTR;

  if first.Country then xSum_Total_Revenue=0;

  if first.Region then Sum_Revenue=0;

  Sum_Revenue+Revenue;

  xSum_Total_Revenue+Revenue;

run;

proc sort data=want;

  by COMP ID Country DATE_QTR descending Region;

run;

data want (drop=xSmiley Happy;

  set want  (where=(infilea eq 1));

  by COMP ID Country DATE_QTR descending Region;

  retain Sum_Total_Revenue;

  if first.DATE_QTR then Sum_Total_Revenue=xSum_Total_Revenue;

run;

View solution in original post


All Replies
Contributor
Posts: 38

Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

data one;

input DATE_QTR $6. COMP $ ID $ Country $ Region $ Income ;

cards;

2005Q2 ABC 345 AUS FER 5000

2005Q2 ABC 345 AUS VHR 3000

2005Q3 ABC 345 AUS FER 1500

2005Q3 ABC 345 AUS VHR 2000

;

run;

data two;

input DATE_QTR $6. COMP $ ID $ Country $ Region $ revenue ;

cards;

2005Q2 ABC 345 AUS FER 30000

2005Q2 ABC 345 AUS VHR 10000

2005Q3 ABC 345 AUS FER 13000

2005Q3 ABC 345 AUS VHR 18000

;

run;

 

proc sql;

create table one_one as

select *

,sum(income) as sum_income

from one

group by comp, id, country, region

;

quit;

 

proc sql;

create table two_two as

select *

,sum(revenue) as sum_revenue

from two

group by comp

;

quit;

proc sql;

create table ratio as

select *

from one_one

left join

two_two

on

one_one.date_qtr = two_two.date_qtr and

one_one.comp = two_two.comp and

one_one.id = two_two.id and

one_one.country = two_two.country and

one_one.region = two_two.region

;

quit;

data ratio;

set ratio;

ratio = (sum_income)/(sum_revenue);

INC_ALLOCATE = INCOME * RATIO;

run;

Hi

I did not understand the last part of your quesiton.

"

Now I would like for each row of data set A to attach RATIO and create a new variable  INC_ALLOCATE = INCOME * RATIO with the following restriction

both  SUM_INCOME & SUM_ TOTAL_REVENUE should be evaluated up to the value of DATE_QTR of the respective row in data set A.

For example for the first row of data set A  DATE_QTR has the value 2005Q2

I would like SUM_INCOME (dataset B)  to be the sum of all TOTAL_REVENUE UP to and including 2005Q2 (for the GROUP BY variables COMP,    ID,    Country,   Region  )

Likewise     SUM_ TOTAL_REVENUE  (dataset B) to be the sum of all TOTAL_REVENUE UP to and including 2005Q2 (for the GROUP BY variable COMP  )

"

If you can be more clear on it I will try to work on it.

I hope it helps.

Good luck

Contributor
Posts: 68

Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

Posted in reply to ArpitSharma

Dear Arpit,

Thank you very much for your help.

I apologize I was not clear enough.

Please see the attachment, since I could not paste the image to the Reply within thw SAS User Group.

I also mistated SUM_INCOME should Read SUM_REVENUE (data set B).

Now as per the attachment, I want SAS to attach RATIO for each observation in DATA set A based on the SUMs from DATA B since the beginning of the BY GROUP vars  up to the DATA A observation Quarter .

Therefore for each new observation Quarter for DATA A I need an updated set of SUM vars from DATA B to include the new Quarters REVENUE amounts.

I think that CALL EXECUTE would be appropriate although I am not aware of the mechanics since these are dynsamically updated SUMS

Thank you in advance.

Kind regards

Nikos2012-01-06_141209.png

Contributor
Posts: 38

Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

Hey

Can you do me a favour.

One please create sample dataset A (input dataset)

                 create sample dataset  B (input dataset)

and

base on these two input dataset please create a sample dataset C

The final dataset C should be as you want.

This will be like a small dataset.

Please do it in excel and paste the table. It will be easy for me to understand.

Plesae put like 15-20 observations in each dataset A and B

and based on this input create C.

I will try to figure out a way for it to be done in sas.

Thanks

Solution
‎01-06-2012 06:53 PM
PROC Star
Posts: 7,485

Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

For what you want I think that a series of datasteps should suffice.  However, there was at least one error in your example data (regarding quarter specified) and there may be an error in my code.  However, I think this comes close enough to give you some direction:

data a;

  informat date_qtr yyq6.;

  format date_qtr yyq6.;

  input DATE_QTR COMP $   ID    Country  $ Region  $  Income comma8.;

  cards;

2005Q2 ABC    345     AUS      FER   5,000

2005Q2 ABC    345     AUS      VHR   3,000

2005Q3 ABC    345     AUS      FER   1,500

2005Q3 ABC    345     AUS      VHR   2,000

2005Q4 ABC    345     AUS      FER   2,500

2005Q4 ABC    345     AUS      VHR   2,800

2006Q1 ABC    345     AUS      FER   1,750

2006Q1 ABC    345     AUS      VHR   2,220

;

data b;

  informat date_qtr yyq6.;

  format date_qtr yyq6.;

  input DATE_QTR COMP $   ID    Country  $ Region  $  Revenue comma8.;

  cards;

2004Q2 ABC    345     AUS      FER  5,000

2004Q2 ABC    345     AUS      VHR  5,000

2004Q3 ABC    345     AUS      FER  5,000

2004Q3 ABC    345     AUS      VHR  5,000

2004Q4 ABC    345     AUS      FER  5,000

2004Q4 ABC    345     AUS      VHR  5,000

2005Q1 ABC    345     AUS      FER  5,000

2005Q1 ABC    345     AUS      VHR  5,000

2005Q2 ABC    345     AUS      FER  5,000

2005Q2 ABC    345     AUS      VHR  5,000

2005Q3 ABC    345     AUS      FER  5,000

2005Q3 ABC    345     AUS      VHR  5,000

2005Q4 ABC    345     AUS      FER  5,000

2005Q4 ABC    345     AUS      VHR  5,000

2006Q1 ABC    345     AUS      FER  5,000

2006Q1 ABC    345     AUS      VHR  5,000

;

proc sort data=a;

  by COMP ID Country Region DATE_QTR;

run;

proc sort data=b;

  by COMP ID Country Region DATE_QTR;

run;

data want;

  merge a (in=ina) b (in=inb);

  by COMP ID Country Region DATE_QTR;

  if ina or inb;

  if ina then infilea=1;

run;

data want;

  set want;

  by COMP ID Country Region DATE_QTR;

  if first.Country then xSum_Total_Revenue=0;

  if first.Region then Sum_Revenue=0;

  Sum_Revenue+Revenue;

  xSum_Total_Revenue+Revenue;

run;

proc sort data=want;

  by COMP ID Country DATE_QTR descending Region;

run;

data want (drop=xSmiley Happy;

  set want  (where=(infilea eq 1));

  by COMP ID Country DATE_QTR descending Region;

  retain Sum_Total_Revenue;

  if first.DATE_QTR then Sum_Total_Revenue=xSum_Total_Revenue;

run;

Respected Advisor
Posts: 4,173

Re: Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

I believe below code does more or less what you're after:

data A;
input DATE_QTR $6. COMP $ ID $ Country $ Region $ Income ;
cards;
2005Q2 ABC 345 AUS FER 5000
2005Q2 ABC 345 AUS VHR 3000
2005Q3 ABC 345 AUS FER 1500
2005Q3 ABC 345 AUS VHR 2000
;
run;

data B;
  length DATE_QTR $6 COMP $8 ID $8 Country $8 Region $8 Revenue 8;
  comp='ABC'; id='345';country='AUS'; Revenue=5000;
  do DATE_QTR='2004Q2','2004Q3','2004Q4','2005Q1','2005Q2','2005Q3';
    do Region='FER','VHR';
      output;
    end;
  end;
run;

proc sql;
  create view B_sorted as
    select *
    from b
    order by COMP, ID, Country, Region
  ;
quit;

proc freq data=B_sorted noprint;
  by COMP ID Country Region;
  tables DATE_QTR / list nopercent missing out=CumQtrReg(drop=percent count cum_pct rename=(cum_freq=Sum_Revenue)) outcum ;
  weight Revenue;
run;

proc freq data=B noprint;
  tables DATE_QTR / list nopercent missing out=CumQtr(drop=percent count cum_pct rename=(cum_freq=Sum_Total_Revenue)) outcum ;
  weight Revenue;
run;

proc sql;
/*  create table want as*/
    select
        t1.*
      , t2.Sum_Revenue as Sum_Revenue label='Sum_Revenue'
      , t3.Sum_Total_Revenue as Sum_Total_Revenue label='Sum_Total_Revenue'
      , t2.Sum_Revenue / t3.Sum_Total_Revenue as Ratio
      , t1.Income * t2.Sum_Revenue / t3.Sum_Total_Revenue as INC_ALLOCATE
    from A t1, CumQtrReg t2, CumQtr t3
    where t1.DATE_QTR=t2.DATE_QTR and t1.DATE_QTR=t3.DATE_QTR
      and t1.COMP=t2.COMP and t1.ID=t2.ID and t1.Country=t2.Country and t1.Region=t2.Region
    order by t1.DATE_QTR, t1.Region
  ;
quit;

Contributor
Posts: 68

Re: Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

Dear all,

Patrick's answer is a  "Correct answer", altough, I could not put multiple appropriate legends.

Thank you

Nikos

Respected Advisor
Posts: 4,173

Re: Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

I feel that when posting the same question to different forums like SAS-L it would be kind of polite to cross reference such postings.

https://groups.google.com/group/comp.soft-sys.sas/browse_thread/thread/947b66d6ff2367cd?hl=en#

I find it always annoying having put in the time for an answer only to find out later that a good solution has already been provided somewhere else.

PROC Star
Posts: 7,485

Re: Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

Patrick,  For me, personally, taking the time to propose solutions has always been equally beneficial to both the person doing the posting and the one(s) receiving it.

However, what I would hope people do when cross-posting is to cross post a summary of good solutions received from the various sites.

Contributor
Posts: 68

Re: Cumulative Sums based on different by group vars to create a Ratio from one dataset to be attached to another set given that cum sums will be created based on the cut off DATE in the second dataset

Dear all,

I would like to thank you for your answers. They helped me a lot.

As far as I know  there is no answer yet to the other site.

In any case I will cross post to the other site your solutions.

Best regards

Nikos

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 2450 views
  • 6 likes
  • 4 in conversation