Dear all,
I would very much appreciate any help with the following issue.
I have a data set A that contains six vars.
DATE_QTR (e.g. YYYYQQ) COMP ID Country Region Income
2005Q2 ABC 345 AUS FER 5,000
2005Q2 ABC 345 AUS VHR 3,000
2005Q3 ABC 345 AUS FER 1,500
2005Q3 ABC 345 AUS VHR 2,000
..................................................................................................
where the GROUP BY vars are:
DATE_QTR (e.g. YYYYQQ)
COMP
ID
Country
Region
and a second dataset B that contains the same Group By variables except the last where instead of INCOME there is TOTAL_REVENUE.
I would like to create in dataset B a RATIO = (SUM_INCOME) / (SUM_ TOTAL_REVENUE) where
SUM_INCOME is based on the vars COMP, ID, Country, Region
& SUM_ TOTAL_REVENUE is based only on the var COMP.
Now I would like for each row of data set A to attach RATIO and create a new variable INC_ALLOCATE = INCOME * RATIO with the following restriction
both SUM_INCOME & SUM_ TOTAL_REVENUE should be evaluated up to the value of DATE_QTR of the respective row in data set A.
For example for the first row of data set A DATE_QTR has the value 2005Q2
I would like SUM_INCOME (dataset B) to be the sum of all TOTAL_REVENUE UP to and including 2005Q2 (for the GROUP BY variables COMP, ID, Country, Region )
Likewise SUM_ TOTAL_REVENUE (dataset B) to be the sum of all TOTAL_REVENUE UP to and including 2005Q2 (for the GROUP BY variable COMP )
Kind regards
Nikos
For what you want I think that a series of datasteps should suffice. However, there was at least one error in your example data (regarding quarter specified) and there may be an error in my code. However, I think this comes close enough to give you some direction:
data a;
informat date_qtr yyq6.;
format date_qtr yyq6.;
input DATE_QTR COMP $ ID Country $ Region $ Income comma8.;
cards;
2005Q2 ABC 345 AUS FER 5,000
2005Q2 ABC 345 AUS VHR 3,000
2005Q3 ABC 345 AUS FER 1,500
2005Q3 ABC 345 AUS VHR 2,000
2005Q4 ABC 345 AUS FER 2,500
2005Q4 ABC 345 AUS VHR 2,800
2006Q1 ABC 345 AUS FER 1,750
2006Q1 ABC 345 AUS VHR 2,220
;
data b;
informat date_qtr yyq6.;
format date_qtr yyq6.;
input DATE_QTR COMP $ ID Country $ Region $ Revenue comma8.;
cards;
2004Q2 ABC 345 AUS FER 5,000
2004Q2 ABC 345 AUS VHR 5,000
2004Q3 ABC 345 AUS FER 5,000
2004Q3 ABC 345 AUS VHR 5,000
2004Q4 ABC 345 AUS FER 5,000
2004Q4 ABC 345 AUS VHR 5,000
2005Q1 ABC 345 AUS FER 5,000
2005Q1 ABC 345 AUS VHR 5,000
2005Q2 ABC 345 AUS FER 5,000
2005Q2 ABC 345 AUS VHR 5,000
2005Q3 ABC 345 AUS FER 5,000
2005Q3 ABC 345 AUS VHR 5,000
2005Q4 ABC 345 AUS FER 5,000
2005Q4 ABC 345 AUS VHR 5,000
2006Q1 ABC 345 AUS FER 5,000
2006Q1 ABC 345 AUS VHR 5,000
;
proc sort data=a;
by COMP ID Country Region DATE_QTR;
run;
proc sort data=b;
by COMP ID Country Region DATE_QTR;
run;
data want;
merge a (in=ina) b (in=inb);
by COMP ID Country Region DATE_QTR;
if ina or inb;
if ina then infilea=1;
run;
data want;
set want;
by COMP ID Country Region DATE_QTR;
if first.Country then xSum_Total_Revenue=0;
if first.Region then Sum_Revenue=0;
Sum_Revenue+Revenue;
xSum_Total_Revenue+Revenue;
run;
proc sort data=want;
by COMP ID Country DATE_QTR descending Region;
run;
data want (drop=x:);
set want (where=(infilea eq 1));
by COMP ID Country DATE_QTR descending Region;
retain Sum_Total_Revenue;
if first.DATE_QTR then Sum_Total_Revenue=xSum_Total_Revenue;
run;
data one;
input DATE_QTR $6. COMP $ ID $ Country $ Region $ Income ;
cards;
2005Q2 ABC 345 AUS FER 5000
2005Q2 ABC 345 AUS VHR 3000
2005Q3 ABC 345 AUS FER 1500
2005Q3 ABC 345 AUS VHR 2000
;
run;
data two;
input DATE_QTR $6. COMP $ ID $ Country $ Region $ revenue ;
cards;
2005Q2 ABC 345 AUS FER 30000
2005Q2 ABC 345 AUS VHR 10000
2005Q3 ABC 345 AUS FER 13000
2005Q3 ABC 345 AUS VHR 18000
;
run;
proc sql;
create table one_one as
select *
,sum(income) as sum_income
from one
group by comp, id, country, region
;
quit;
proc sql;
create table two_two as
select *
,sum(revenue) as sum_revenue
from two
group by comp
;
quit;
proc sql;
create table ratio as
select *
from one_one
left join
two_two
on
one_one.date_qtr = two_two.date_qtr and
one_one.comp = two_two.comp and
one_one.id = two_two.id and
one_one.country = two_two.country and
one_one.region = two_two.region
;
quit;
data ratio;
set ratio;
ratio = (sum_income)/(sum_revenue);
INC_ALLOCATE = INCOME * RATIO;
run;
Hi
I did not understand the last part of your quesiton.
"
Now I would like for each row of data set A to attach RATIO and create a new variable INC_ALLOCATE = INCOME * RATIO with the following restriction
both SUM_INCOME & SUM_ TOTAL_REVENUE should be evaluated up to the value of DATE_QTR of the respective row in data set A.
For example for the first row of data set A DATE_QTR has the value 2005Q2
I would like SUM_INCOME (dataset B) to be the sum of all TOTAL_REVENUE UP to and including 2005Q2 (for the GROUP BY variables COMP, ID, Country, Region )
Likewise SUM_ TOTAL_REVENUE (dataset B) to be the sum of all TOTAL_REVENUE UP to and including 2005Q2 (for the GROUP BY variable COMP )
"
If you can be more clear on it I will try to work on it.
I hope it helps.
Good luck
Dear Arpit,
Thank you very much for your help.
I apologize I was not clear enough.
Please see the attachment, since I could not paste the image to the Reply within thw SAS User Group.
I also mistated SUM_INCOME should Read SUM_REVENUE (data set B).
Now as per the attachment, I want SAS to attach RATIO for each observation in DATA set A based on the SUMs from DATA B since the beginning of the BY GROUP vars up to the DATA A observation Quarter .
Therefore for each new observation Quarter for DATA A I need an updated set of SUM vars from DATA B to include the new Quarters REVENUE amounts.
I think that CALL EXECUTE would be appropriate although I am not aware of the mechanics since these are dynsamically updated SUMS
Thank you in advance.
Kind regards
Nikos
Hey
Can you do me a favour.
One please create sample dataset A (input dataset)
create sample dataset B (input dataset)
and
base on these two input dataset please create a sample dataset C
The final dataset C should be as you want.
This will be like a small dataset.
Please do it in excel and paste the table. It will be easy for me to understand.
Plesae put like 15-20 observations in each dataset A and B
and based on this input create C.
I will try to figure out a way for it to be done in sas.
Thanks
For what you want I think that a series of datasteps should suffice. However, there was at least one error in your example data (regarding quarter specified) and there may be an error in my code. However, I think this comes close enough to give you some direction:
data a;
informat date_qtr yyq6.;
format date_qtr yyq6.;
input DATE_QTR COMP $ ID Country $ Region $ Income comma8.;
cards;
2005Q2 ABC 345 AUS FER 5,000
2005Q2 ABC 345 AUS VHR 3,000
2005Q3 ABC 345 AUS FER 1,500
2005Q3 ABC 345 AUS VHR 2,000
2005Q4 ABC 345 AUS FER 2,500
2005Q4 ABC 345 AUS VHR 2,800
2006Q1 ABC 345 AUS FER 1,750
2006Q1 ABC 345 AUS VHR 2,220
;
data b;
informat date_qtr yyq6.;
format date_qtr yyq6.;
input DATE_QTR COMP $ ID Country $ Region $ Revenue comma8.;
cards;
2004Q2 ABC 345 AUS FER 5,000
2004Q2 ABC 345 AUS VHR 5,000
2004Q3 ABC 345 AUS FER 5,000
2004Q3 ABC 345 AUS VHR 5,000
2004Q4 ABC 345 AUS FER 5,000
2004Q4 ABC 345 AUS VHR 5,000
2005Q1 ABC 345 AUS FER 5,000
2005Q1 ABC 345 AUS VHR 5,000
2005Q2 ABC 345 AUS FER 5,000
2005Q2 ABC 345 AUS VHR 5,000
2005Q3 ABC 345 AUS FER 5,000
2005Q3 ABC 345 AUS VHR 5,000
2005Q4 ABC 345 AUS FER 5,000
2005Q4 ABC 345 AUS VHR 5,000
2006Q1 ABC 345 AUS FER 5,000
2006Q1 ABC 345 AUS VHR 5,000
;
proc sort data=a;
by COMP ID Country Region DATE_QTR;
run;
proc sort data=b;
by COMP ID Country Region DATE_QTR;
run;
data want;
merge a (in=ina) b (in=inb);
by COMP ID Country Region DATE_QTR;
if ina or inb;
if ina then infilea=1;
run;
data want;
set want;
by COMP ID Country Region DATE_QTR;
if first.Country then xSum_Total_Revenue=0;
if first.Region then Sum_Revenue=0;
Sum_Revenue+Revenue;
xSum_Total_Revenue+Revenue;
run;
proc sort data=want;
by COMP ID Country DATE_QTR descending Region;
run;
data want (drop=x:);
set want (where=(infilea eq 1));
by COMP ID Country DATE_QTR descending Region;
retain Sum_Total_Revenue;
if first.DATE_QTR then Sum_Total_Revenue=xSum_Total_Revenue;
run;
I believe below code does more or less what you're after:
data A;
input DATE_QTR $6. COMP $ ID $ Country $ Region $ Income ;
cards;
2005Q2 ABC 345 AUS FER 5000
2005Q2 ABC 345 AUS VHR 3000
2005Q3 ABC 345 AUS FER 1500
2005Q3 ABC 345 AUS VHR 2000
;
run;
data B;
length DATE_QTR $6 COMP $8 ID $8 Country $8 Region $8 Revenue 8;
comp='ABC'; id='345';country='AUS'; Revenue=5000;
do DATE_QTR='2004Q2','2004Q3','2004Q4','2005Q1','2005Q2','2005Q3';
do Region='FER','VHR';
output;
end;
end;
run;
proc sql;
create view B_sorted as
select *
from b
order by COMP, ID, Country, Region
;
quit;
proc freq data=B_sorted noprint;
by COMP ID Country Region;
tables DATE_QTR / list nopercent missing out=CumQtrReg(drop=percent count cum_pct rename=(cum_freq=Sum_Revenue)) outcum ;
weight Revenue;
run;
proc freq data=B noprint;
tables DATE_QTR / list nopercent missing out=CumQtr(drop=percent count cum_pct rename=(cum_freq=Sum_Total_Revenue)) outcum ;
weight Revenue;
run;
proc sql;
/* create table want as*/
select
t1.*
, t2.Sum_Revenue as Sum_Revenue label='Sum_Revenue'
, t3.Sum_Total_Revenue as Sum_Total_Revenue label='Sum_Total_Revenue'
, t2.Sum_Revenue / t3.Sum_Total_Revenue as Ratio
, t1.Income * t2.Sum_Revenue / t3.Sum_Total_Revenue as INC_ALLOCATE
from A t1, CumQtrReg t2, CumQtr t3
where t1.DATE_QTR=t2.DATE_QTR and t1.DATE_QTR=t3.DATE_QTR
and t1.COMP=t2.COMP and t1.ID=t2.ID and t1.Country=t2.Country and t1.Region=t2.Region
order by t1.DATE_QTR, t1.Region
;
quit;
Dear all,
Patrick's answer is a "Correct answer", altough, I could not put multiple appropriate legends.
Thank you
Nikos
I feel that when posting the same question to different forums like SAS-L it would be kind of polite to cross reference such postings.
https://groups.google.com/group/comp.soft-sys.sas/browse_thread/thread/947b66d6ff2367cd?hl=en#
I find it always annoying having put in the time for an answer only to find out later that a good solution has already been provided somewhere else.
Patrick, For me, personally, taking the time to propose solutions has always been equally beneficial to both the person doing the posting and the one(s) receiving it.
However, what I would hope people do when cross-posting is to cross post a summary of good solutions received from the various sites.
Dear all,
I would like to thank you for your answers. They helped me a lot.
As far as I know there is no answer yet to the other site.
In any case I will cross post to the other site your solutions.
Best regards
Nikos
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.