I'm working with a very large dataset and it has a MONTH variable from 1-12,
and also a VALUE variable, with amounts possible for each month.
I would like to determine mean, median, etc. for a yearly (annual) amount,
summing each person's (observation) for 2 months of values.
What's the easiest way for a noob to do this?
Thanks in advance...
@mikeed wrote:
For each ID, I'd would like to add up all the VALUES, from MONTH 1 to MONTH 12, for an annual VALUE.
Then I would like to find summary statistics of annual VALUE for each ID.
Perhaps this explains it better and it can't be accomplished in a single procedure, and that's my problem?
Yes, this is a two step problem so you can simply apply your proc twice.
The first time you're summing for the totals and in the second you're generating your summary statistics of the total value.
You can modify the statistics you get and the summary based on the statistics you specify in the PROC MEANS/SUMMARY statements.
You've been provided with multiple samples on how to run it for one, so you should be able to expand it to two sets of data.
But regardless, here's one way:
I would like to determine mean, median, etc. for a yearly (annual) amount,
summing each person's (observation) for 2 months of values.
I''m afraid I'm not able to comprehend this part of the request. The top line makes perfect sense, but not in combination with the second line, which is rather cryptic.
Please show us a small amount of this data, and explain what results you'd like from this small amount of data.
sorry, typo. all 12 months.
each person has a unique ID and MONTHCODE variable from 1-12.
I want to sum the VALUEs of MONTHCODE1-12 for each person, then determine mean/median annual VALUE.
I tend to use PROC SUMMARY for this, UNIVARIATE would also work but the code would be different
/* UNTESTED CODE */
proc summary data=have;
class id;
var value;
output out=want mean=meanvalue median=medianvalue;
run;
thanks,
would you be able to provide an example of the proc univariate code I would use?
As I haven't used UNIVARIATE in years, my answer is that I can't, off the top of my head, provide UNIVARIATE code. It probably isn't much different, however you can read the documentation for PROC UNIVARIATE and see if you can figure it out.
The other problem is that UNIVARIATE is computing a huge amount of statistics that you haven't requested, and depending on much data you have, this could slow things down dramatically and produce a huge long output file.
Proc means, summary and univariate use similar processes so either should work for your request.
Why is PROC UNIVARIATE 'required'?
Here's a fully worked example of getting summary statistics using PROC MEAN:
https://github.com/statgeek/SAS-Tutorials/blob/master/proc_means_basic.sas
I don't need to use Univariate, but I'm still having difficulty trying to figure out
how to determine the annual values of the summary statistics
Help still requested.
@mikeed wrote:
I don't need to use Univariate, but I'm still having difficulty trying to figure out
how to determine the annual values of the summary statistics
Help still requested.
If the code I gave is not working properly for you, then please explain what is happening that is wrong, and show us the SASLOG and results. Otherwise, I assume the problem has been solved.
ID MONTH VALUE
1 1 65
1 2 17
. . .
1 11 47
1 12 99
2 1 55
2 2 98
. . .
2 11 45
2 12 18
3
...
@mikeed wrote:
ID MONTH VALUE
1 1 65
1 2 17
. . .
1 11 47
1 12 99
2 1 55
2 2 98
. . .
2 11 45
2 12 18
3
...
NOW provide what the output is supposed to look like for that input.
For each ID, I'd would like to add up all the VALUES, from MONTH 1 to MONTH 12, for an annual VALUE.
Then I would like to find summary statistics of annual VALUE for each ID.
Perhaps this explains it better and it can't be accomplished in a single procedure, and that's my problem?
@mikeed wrote:
For each ID, I'd would like to add up all the VALUES, from MONTH 1 to MONTH 12, for an annual VALUE.
Then I would like to find summary statistics of annual VALUE for each ID.
Perhaps this explains it better and it can't be accomplished in a single procedure, and that's my problem?
I provided code to do this in a single procedure already in the thread. Why do you discuss this as if it there is no such code?
I'm sorry, your code was not detailed enough and did not work since it did not help me
tally the MONTHs that I needed to find the statistics for.
You can't just say "it didn't work". You have to give us details. You have to show us the SASLOG and the data set created, and explain why this is not the proper result.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.