I have a dataset which represents the volume of sales over three years:
data test;
input one two three average;
datalines;
10 20 30 .
20 30 40 .
10 30 50 .
10 10 10 .
;
run;
I'm looking for a way to find the middle point of the three years, the average sale point
the updated dataset would read
data test;
input one two three average;
datalines;
10 20 30 2
20 30 40 1.5
10 30 50 2.1
10 10 10 1.5
;
run;
So essentially looking for what part of the three years the halfway point of the sales occurred.
Appreciate.
EDIT: what I've been trying with the weight and proc means
I've been trying to use proc means and weight function but it doesn't give me the average point of the three years
proc means data=test noprint;
var one two three;
var one+two+three=total;
var (one+two+three)/3=Average;
var Average/weight=Average_Year;
output out=testa2
sum(Total) =
mean(Total) = ;
run;
So, I guess you want something like
data test;
input one two three;
midpoint = sum(one, two, three) / 2;
if midpoint < one then halfSalesPoint = midpoint / one;
else if midpoint < one + two then halfSalesPoint = 1 + (midpoint - one) / two;
else halfSalesPoint = 2 + (midpoint - one - two) / three;
drop midpoint;
datalines;
10 20 30
20 30 40
10 30 50
10 10 10
;
proc print data=test; run;
@89974114 wrote:
I'm looking for a way to find the middle point of the three years, the average sale point
the updated dataset would read
data test; input one two three average; datalines; 10 20 30 2 20 30 40 1.5 10 30 50 2.1 10 10 10 1.5 ; run;
I have to admit I am not following this, I don't see how you have computed the average value.
so the first value would be 10+20+30=60, then midway is 30 which is 10+20 so 2 years
I'm looking for the average point throughout the three years based on the volume of sales
the second would be 20+30+40=90 / 2 = 45 ,
45-20 = 25 then 25/30 = 5/6th of a year so my mistake the second line should be 1 + 5/6th years
So i'm thinking if each volume is given a weight based on total volume, how far through the three years is the middle point
so the first value would be 10+20+30=60, then midway is 30 which is 10+20 so 2 years
Still not following this at all.
the second would be 20+30+40=90 / 2 = 45 ,
45-20 = 25 then 25/30 = 5/6th
So in the first example, there is no subtraction happening, but in this example there is a subtraction in the math?
Maybe think of it like a cumulative frequency curve, as you add all three observations together you move from 0 to 100% of the value, think of the x axis as 1-3 years and the y-axis as 0 to 100%, you are looking for the point that aligns 50% on the y-axis and the middle point of the x-axis.
So, I guess you want something like
data test;
input one two three;
midpoint = sum(one, two, three) / 2;
if midpoint < one then halfSalesPoint = midpoint / one;
else if midpoint < one + two then halfSalesPoint = 1 + (midpoint - one) / two;
else halfSalesPoint = 2 + (midpoint - one - two) / three;
drop midpoint;
datalines;
10 20 30
20 30 40
10 30 50
10 10 10
;
proc print data=test; run;
I can't fault that it gives the right answers, appreciate. I am wondering if there is a more efficient solution though as I'll be performing this datastep with millions of entries
@89974114 wrote:
I can't fault that it gives the right answers, appreciate. I am wondering if there is a more efficient solution though as I'll be performing this datastep with millions of entries
What is inefficient about the accepted solution?
There must be a function in the proc means or a proc command which allows you to find the weighted average with respect to the observation length
@89974114 wrote:
There must be a function in the proc means or a proc command which allows you to find the weighted average with respect to the observation length
If the actual concern is that your actual problem involves more than 3 variables whose names are actually ordinal data values then you might be looking at an array to hold the listed variables and compare the mean to an iteratively created cumulative total.
I just don't see how this pr/oblem can be expressed as a weighted average. To me, the solution to your problem amounts to finding the inverse of a piecewise linear function.
This little datastep should run very fast. I doubt that any proc can do much better.
I was being unreasonably pedantic yesterday, my apologies.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.