DATA Step, Macro, Functions and more

How to perform calculation on every nth observation within an array?

Reply
Contributor
Posts: 20

How to perform calculation on every nth observation within an array?

I have a data set that breaks down the monthly average price of three products over the course of 10 years.  I need to represent the yearly average price for each of the three products, so I essentially to take the overall average of every 12 observations for 396+ observations. I'm trying to write an array that will do this. I know that there's something very off structurally, but I'm having a hard time seeing it myself. I've created three variables (gas_yearly etc.) but I've hit a dead end when it comes to writing the looping statement to average every 12 observations. 

data want;
set qfour.aveprices;
input gas_yearly milk_yearly eggs_yeary;
array avg(3)gas_yearly milk_yearly eggs_yeary;
	 do i=1 to dim(avg);
	 *a statement that represents the overall average of every 12 observations

Thank you!

-e

 

 

Respected Advisor
Posts: 2,825

Re: How to perform calculation on every nth observation within an array?

[ Edited ]

So here's the problem. Arrays work on variables, going across the rows, and not down a column, which seems to be what you are describing. Thus, as you have explained the problem, arrays will not work (or you have explained it in such a way that I'm not understanding the problem).

 

However, if you want to find the means of every sequence of 12 observations, here is how to do this. Create an index indicating which group of 12 observations each observation is in. This assumes that there is no variable indicating YEAR that is already in the data set. (And if there is not, then there should be). Then run PROC SUMMARY on this index variable.

 

Data want;
    set qfour.aveprices;
    group_of_twelve=floor((_n_-1)/12);
run;

Proc summary data=want;
    class group_of_twelve;
    var gas_yearly milk_yearly eggs_yeary; 
    /* That's your spelling error, not mine */
    output out=final mean=;
run;

 

--
Paige Miller
Super User
Posts: 3,857

Re: How to perform calculation on every nth observation within an array?

Why don't you add a date column to your data to record the date of the measurements? It will make your analyses so much easier.

Super User
Posts: 23,295

Re: How to perform calculation on every nth observation within an array?

Can you explain your data structure and provide some sample data, fake data is fine.

 

What happens if your data is sorted? Can you sort the data to not take the 12th row and instead group things together more logically?

Contributor
Posts: 20

Re: How to perform calculation on every nth observation within an array?

I've been issues sorting the data with two conditions. Here's a snippet of the original:gas_shot.PNG

 

 

It goes from the years 2004-2014 for each product. Everything is already sorted, I'm having issues grouping and subsetting. 

this code works:
proc print data = qfour.aveprices;
where year = 2004;run; 

this code does not for some reason.
proc print data = qfour.aveprices;
where year = 2004 and commodity = 'gas';run;

this also doesn't work
proc print data = qfour.aveprices;
where commodity = 'gas';
I get this response: NOTE: No observations were selected from data set QFOUR.AVEPRICES. NOTE: There were 0 observations read from the data set QFOUR.AVEPRICES. WHERE (commodity='gas');

I think I need to create a dataset for each product and then perhaps I can create a variable that calculates the average of every 12 obs. 

 

Also trying to do the normal data step, where I create an output hasn't worked either. 

  DATA sample_small;
  SET test.aveprices;
     IF year = 2014;
     If commodity = gas;
     RUN;
Proc print sample_small;run;
NOTE: Variable gas is uninitialized.
NOTE: There were 396 observations read from the data set TEST.AVEPRICES.
NOTE: The data set WORK.SAMPLE_SMALL has 0 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.23 seconds
      cpu time            0.21 seconds


2614  proc print data= sample_small;run;
Super User
Posts: 23,295

Re: How to perform calculation on every nth observation within an array?

It’s Gas, capital G.

Try the following, see the want data set that gets created.


Proc means data=have;
Class year commodity;
Var price;
Output out =want mean= avg_price;
Run;
Contributor
Posts: 20

Re: How to perform calculation on every nth observation within an array?

[ Edited ]

thank you, that worked! I winded up using that code to create two outputs, one that shows the yearly average for product and the other that shows the overall average for commodities. I need to merge these two datasets, to show the yearly and monthly average per product, but I don't have a unique identifier, the only thing I have is commodity, and sort and merging by commodity doesn't seem to work. Do you know if there's a way to get around only having 2 classes in a class statement?

Respected Advisor
Posts: 2,825

Re: How to perform calculation on every nth observation within an array?


@Errant wrote:

thank you, that worked! I winded up using that code to create two outputs, one that shows the yearly average for product and the other that shows the overall average for commodities. I need to merge these two datasets, to show the yearly and monthly average per product, but I don't have a unique identifier, the only thing I have is commodity, and sort and merging by commodity doesn't seem to work. Do you know if there's a way to get around only having 2 classes in a class statement?


There's really no way for me to advise you, unless you show us what you are talking about, and show us what you want.

 

You can have as many variables in a CLASS statement as you want.

--
Paige Miller
Contributor
Posts: 20

Re: How to perform calculation on every nth observation within an array?

Posted in reply to PaigeMiller

So at this point in  time I have two outputs. mnth_shot.PNG

 

yr_shot.PNG

 

 

I essentially want an output that expresses the product average for both month and year, by utilizing these outputs. However, I'm confused about how to sort/merge it. And the class statement runs with three classes, but it only produces one average, which makes me that maybe it only works on one dependent variable (avg_price in this case)?

 

Here's the code I tried.

Proc means data=qfour.aveprices;
Class year commodity;
Var price;
Output out =want mean= avg_price;
Run;

proc means data = qfour.aveprices;
Class month commodity;
var price;
output out = want2 mean =avg_price; run; 

*following only produces one average; proc means data = qfour.aveprices; Class month year commodity; var price; output out = want3 mean =avg_price; run;

 

I think I need to create two separate variables, one that represents yearly average and the other that represents monthly average. 

Respected Advisor
Posts: 2,825

Re: How to perform calculation on every nth observation within an array?

You have two separate data sets. I see that. Now explain them, what are they? How do you want to combine them? Show us, type in a portion of what you would like to see when you combine these two data sets.

 

I essentially want an output that expresses the product average for both month and year, by utilizing these outputs.

 

This is the first time you have mentioned that you want an average BY MONTH. Explain further where this month average would come from given your original data that you showed.

--
Paige Miller
Contributor
Posts: 20

Re: How to perform calculation on every nth observation within an array?

Posted in reply to PaigeMiller

Sorry about the confusion, the monthly calculation was just the next calculation that I needed to complete after getting the yearIy average. After getting both of those in two separate outputs, I wanted to merge them, to get a table that shows the the product name, the yearly average and the monthly average, essentially combining the averages from both outputs. 

 

At this point, I've tried sorting each output by commodity name (seemed like the closest thing to an identifier in both outputs).

 

I used this code to sort

data sorta;
set want;
by commodity;run;
proc print data = sorta;run;

*and then for the second output data sortb; set want2; by commodity;run; proc print data = sortb;run;

I get the same output for both commands for some reason. And it doesn't state any errors, but as you can see it's missing data. Which is messing up the merge process.

 eggs.PNG

 Ultimately, I would like a data set that looks like the one above with the commodity and the average price for year and month.

Respected Advisor
Posts: 2,825

Re: How to perform calculation on every nth observation within an array?

[ Edited ]

@Errant wrote:

Sorry about the confusion, the monthly calculation was just the next calculation that I needed to complete after getting the yearIy average. After getting both of those in two separate outputs, I wanted to merge them, to get a table that shows the the product name, the yearly average and the monthly average, essentially combining the averages from both outputs.


I don't think you really want to "merge" the two data sets ... not MERGE in the sense of the SAS data step command MERGE. Or maybe you do mean "merge", but I can't imagine what that table would look like. Please show us an example (type it in yourself) of what the merged table looks like.

 

I also don't know what final result of all this analysis should be, after you do the merge somehow, then what? Please show us. There may be very simple ways to get there, we might be able to figure that out, if only you told us what final analysis should look like.

 

I think we need to temporarily stop talking about SAS code, and talk about EXACTLY what output you need and where you are going, and the only way to do that is for you to show us what you want.

--
Paige Miller
Contributor
Posts: 20

Re: How to perform calculation on every nth observation within an array?

Posted in reply to PaigeMiller

So I've reviewed the original requirements and I misinterpreted the requirements. Ultimately I do need to combine the month output with the yearly average output, but the month output actually needs to look like this:

 

ideal.PNG

So technically I'm not calculating an average for the month output. I'm confused about how to get this from the original dataset though.

 

Super User
Posts: 23,295

Re: How to perform calculation on every nth observation within an array?


*Sort by month/year;
proc sort data=have;
by month year;
run;

*calculate summary statistics by month;
proc means data=have nway;
by month;
var gas milk eggs;
output out=monthly_averages mean(gas)= avg_gas mean(milk) = avg_milk mean(eggs)=avg_eggs;
run;

*merge results back into original dataset by month;
data want;
merge have monthly_averages;
by month;
run;

If you post data as text instead of image that would be helpful. We can't write a program off an image, unless we type it out and I'm too lazy for that. This will calculate monthly averages across all years, if you only need certain years, you can use a WHERE statement to filter the years. Then it merges it back to the original data set. 

Contributor
Posts: 20

Re: How to perform calculation on every nth observation within an array?

There a lot of observations, but I can upload the datafile.

Attachment
Ask a Question
Discussion stats
  • 20 replies
  • 224 views
  • 4 likes
  • 6 in conversation