BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
89974114
Quartz | Level 8

I have a dataset which represents the volume of sales over three years:

data test;
input one two three average;
datalines;
10 20 30 .
20 30 40 .
10 30 50 .
10 10 10 .
;
run;

I'm looking for a way to find the middle point of the three years, the average sale point

the updated dataset would read

data test;
input one two three average;
datalines;
10 20 30 2
20 30 40 1.5
10 30 50 2.1
10 10 10 1.5
;
run;

So essentially looking for what part of the three years the halfway point of the sales occurred.

Appreciate.

EDIT: what I've been trying with the weight and proc means

I've been trying to use proc means and weight function but it doesn't give me the average point of the three years

proc means data=test noprint;
var one two three;
var one+two+three=total;
var (one+two+three)/3=Average; 
var Average/weight=Average_Year;

output out=testa2
    sum(Total) = 
    mean(Total) = ;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

So, I guess you want something like

 

data test;
input one two three;
midpoint = sum(one, two, three) / 2;
if midpoint < one then halfSalesPoint = midpoint / one;
else if midpoint < one + two then halfSalesPoint = 1 + (midpoint - one) / two;
else halfSalesPoint = 2 + (midpoint - one - two) / three;
drop midpoint;
datalines;
10 20 30
20 30 40
10 30 50
10 10 10
;

proc print data=test; run;

 

PG

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

@89974114 wrote:

I'm looking for a way to find the middle point of the three years, the average sale point

the updated dataset would read

data test;
input one two three average;
datalines;
10 20 30 2
20 30 40 1.5
10 30 50 2.1
10 10 10 1.5
;
run;

 


I have to admit I am not following this, I don't see how you have computed the average value.

--
Paige Miller
89974114
Quartz | Level 8

so the first value would be 10+20+30=60, then midway is 30 which is 10+20 so 2 years

I'm looking for the average point throughout the three years based on the volume of sales

the second would be 20+30+40=90 / 2 = 45 ,

45-20 = 25 then 25/30 = 5/6th of a year so my mistake the second line should be 1 + 5/6th years

89974114
Quartz | Level 8

So i'm thinking if each volume is given a weight based on total volume, how far through the three years is the middle point

PaigeMiller
Diamond | Level 26

so the first value would be 10+20+30=60, then midway is 30 which is 10+20 so 2 years

 

Still not following this at all.

 

the second would be 20+30+40=90 / 2 = 45 ,

45-20 = 25 then 25/30 = 5/6th

 

So in the first example, there is no subtraction happening, but in this example there is a subtraction in the math?

--
Paige Miller
89974114
Quartz | Level 8

Maybe think of it like a cumulative frequency curve, as you add all three observations together you move from 0 to 100% of the value, think of the x axis as 1-3 years and the y-axis as 0 to 100%, you are looking for the point that aligns 50% on the y-axis and the middle point of the x-axis.

PGStats
Opal | Level 21

So, I guess you want something like

 

data test;
input one two three;
midpoint = sum(one, two, three) / 2;
if midpoint < one then halfSalesPoint = midpoint / one;
else if midpoint < one + two then halfSalesPoint = 1 + (midpoint - one) / two;
else halfSalesPoint = 2 + (midpoint - one - two) / three;
drop midpoint;
datalines;
10 20 30
20 30 40
10 30 50
10 10 10
;

proc print data=test; run;

 

PG
89974114
Quartz | Level 8

I can't fault that it gives the right answers, appreciate. I am wondering if there is a more efficient solution though as I'll be performing this datastep with millions of entries

ballardw
Super User

@89974114 wrote:

I can't fault that it gives the right answers, appreciate. I am wondering if there is a more efficient solution though as I'll be performing this datastep with millions of entries


What is inefficient about the accepted solution?

89974114
Quartz | Level 8

There must be a function in the proc means or a proc command which allows you to find the weighted average with respect to the observation length

ballardw
Super User

@89974114 wrote:

There must be a function in the proc means or a proc command which allows you to find the weighted average with respect to the observation length


If the actual concern is that your actual problem involves more than 3 variables whose names are actually ordinal data values then you might be looking at an array to hold the listed variables and compare the mean to an iteratively created cumulative total.

 

 

PGStats
Opal | Level 21

I just don't see how this pr/oblem can be expressed as a weighted average. To me, the solution to your problem amounts to finding the inverse of a piecewise linear function.

 

This little datastep should run very fast. I doubt that any proc can do much better.

PG
89974114
Quartz | Level 8

I was being unreasonably pedantic yesterday, my apologies.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 4311 views
  • 5 likes
  • 4 in conversation