Loop or datastep? Average 10 rows from one column, place values in new...

TronicLaine · Posted 07-31-2020 10:21 AM

I would like to take 10 rows at a time from column close_4 and place the average in observation 1 of new column named aveClose_4.

proc sql;
create table STOCKA.testing_ave AS

SELECT close_4, close_3, (high52_4 - low52_4) AS spread4_52wk, (high52_3 - low52_3) AS spread3_52wk,
loop?AVE(close_4 Startobs=2 obs11) AS aveClose_4

/**

repeat averages of 10observations until end of observations
I was hoping to try a loop, within the select statement but got lost

**/

FROM STOCKSA.MERGED_CONTENT ORDER BY 'Ticker/Symbol_4'n ASCENDING;
RUN;
QUIT;

PROC DATASETS NOLIST NODETAILS;
CONTENTS DATA=STOCKSA.testing OUT=STOCKSA.testingcontents;
RUN;

PROC PRINT DATA=STOCKSA.testingcontents; RUN;
PROC PRINT DATA=STOCKSA.testing_ave; RUN;

Thanks in advance.

PaigeMiller · Posted 07-31-2020 10:44 AM

Average closing price of 10 companies who have alphabetically similar Ticker symbols?

--
Paige Miller

TronicLaine · Posted 07-31-2020 11:38 AM

Yes,

I am wishing to do analysis of three letter stocks compared to four letter stocks. I wish to average every 10 observations and compare their values.

I wanted help with the coding to do one column so that I could complete the other columns of similar nature.

Thank you.

PaigeMiller · Posted 07-31-2020 11:47 AM

Every once in a while, I just feel compelled to say that you should stop and re-think the logic of what you are doing. This is one of those situations.

Averaging 10 consecutive companies closing prices (consecutive in alphabetical ticker symbol) makes no sense. It would be like averaging the telephone numbers of those companies. You can certainly do it, but it seems to be a totally pointless thing to do.

--
Paige Miller

TronicLaine · Posted 07-31-2020 02:10 PM

I just need some help in compiling the programming loop or datastep for this process.

yes, I actually only want 10 observations at a time, repeated and each average represented in a new row of the new column. I want to use the variable column names.
yes, it is my own weird experimentation- but the logic behind it is that companies are rather random by nature according to stock ticker symbols, so no need to chose 10 random observations when the values will be random accordingly to their business value, even though they are listed in alphabetical order.
if it makes more sense, help me create the code for random selection of observations grouped at 10 per sample--
- but my personal goal was to just compare the two categories which I have separated in the data as follows:
average10 at a time from close_4 column, then place value in aveClose_4 column and repeat.
average10 at a time from close_3- place in aveClose_3 column and repeat.
average10 at a time from (high52_4-low52_4), place in spread52_4 column & repeat.
average10 at a time from high52_3 minus low52_3, place in spread52_4 column & repeat.
Then Calculate the sum of those categories for the first 30 observations.

Thank you in advance.

TronicLaine · Posted 07-31-2020 02:25 PM

If you were given a gift from 10 random stocks according to ticker length groupings ---

given that you had two options;

1. random 10 stocks from three letter tickers

2. random 10 stocks from four letter tickers

Which would on average, give you a higher current value?
Then, using the spread of the 52 week high - 52 week low, you could assess whether the potential for the stocks to increase in value in the future.

That is what I am trying to simulate here.

ballardw · Posted 07-31-2020 07:32 PM

@TronicLaine wrote:

If you were given a gift from 10 random stocks according to ticker length groupings ---

given that you had two options;

1. random 10 stocks from three letter tickers

2. random 10 stocks from four letter tickers

Which would on average, give you a higher current value?

Then, using the spread of the 52 week high - 52 week low, you could assess whether the potential for the stocks to increase in value in the future.

That is what I am trying to simulate here.

Well, if you SORT your tickers, as your example starting data shows, you aren't doing random groupings and that is going to drastically reduce the use of the interpretation of what you claim you are doing. You would be better off, if that is what you actually want to do, is to create random groups (and why size 10??)

mkeintz · Posted 07-31-2020 12:55 PM

You want to compare close_4 for 3-letter vs 4-letter ticker symbols. Why?

And if there is a reason for this, then why exclude 1-letter symbols (F for Ford) and 2-letter (AT for Atlantic Power)?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ballardw · Posted 07-31-2020 01:23 PM

And why does row 1 have the average of rows 2 to 12 (which is actually 11 values, count it on your fingers),

row 2 average rows 12 to 21? (over laps values from the first as well as being a different count of rows, actually 10 this time). And why isn't row 1 values included in any of the averages?

Or was your implication that you aren't actually attaching those values to the output but want to have a summary?

You could add a row counter using a data step to group rows by 10's but the whole process smells to high heaven for any possible interpretation.

What do you expect to do with those averages after you have them???

If this is only an exercise in some weird programming let us know, we can help with pieces but I really don't want to encourage stupid analysis approaches. I see enough of that on TV news.

TronicLaine · Posted 07-31-2020 02:14 PM

I just need some help in compiling the programming loop or datastep for this process.

yes, I actually only want 10 observations at a time, repeated and each average represented in a new row of the new column. I want to use the variable column names.
yes, it is my own weird experimentation- but the logic behind it is that companies are rather random by nature according to stock ticker symbols, so no need to chose 10 random observations when the values will be random accordingly to their business value, even though they are listed in alphabetical order.
if it makes more sense, help me create the code for random selection of observations grouped at 10 per sample--
- but my personal goal was to just compare the two categories which I have separated in the data as follows:
average10 at a time from close_4 column, then place value in aveClose_4 column and repeat.
average10 at a time from close_3- place in aveClose_3 column and repeat.
average10 at a time from (high52_4-low52_4), place in spread52_4 column & repeat.
average10 at a time from high52_3 minus low52_3, place in spread52_4 column & repeat.
Then Calculate the sum of those categories for the first 30 observations.

Thank you in advance.

AMSAS · Posted 07-31-2020 02:51 PM

Here's some example code that hopefully will help. Like others that replied I don't understand the "why?", but that's Ok
Note as you get near the end of the got dataset my code reduces the denominator for the average. For example, when you get to observation #19, you can't average 10 observations as there is only observation #19 & #20. So the code sums them and divides by 2.

data got ;
	do i=1 to 20 ;
		closingPrice=int(ranuni(0)*10) ;
		output ;
	end ;
run ;

data want ;
	keep i avg ;
	if 1=2 then 
		set got nobs=numberOfObs ;
	retain 
		start  1 
		end   10 
		counter 20 ;
	if end>numberOfObs then
		end=numberOfObs ;
	sum=0 ;
	do j=start to end ;
		set got point=j ;
		sum=sum+closingPrice ;
	end ;
	avg=sum/(end-start+1) ;
	output ;
	start=start+1 ;
	end=end+1 ;
	if start>numberOfObs then 
		stop ;
run ;

Loop or datastep? Average 10 rows from one column, place values in new column.

Re: Loop or datastep? Average 10 rows from one column, place values in new column.

Re: Loop or datastep? Average 10 rows from one column, place values in new column.

Re: Loop or datastep? Average 10 rows from one column, place values in new column.

Re: Loop or datastep? Average 10 rows from one column, place values in new column.

Re: Loop or datastep? Average 10 rows from one column, place values in new column.

Re: Loop or datastep? Average 10 rows from one column, place values in new column.

Re: Loop or datastep? Average 10 rows from one column, place values in new column.

Re: Loop or datastep? Average 10 rows from one column, place values in new column.

Re: Loop or datastep? Average 10 rows from one column, place values in new column.

Re: Loop or datastep? Average 10 rows from one column, place values in new column.

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!