Hi,
I have bunch of files each having daily stock price of a company.
say, for a company named "AXIS" (in file axis.csv),
Date | Price |
1999-11-18 | 31.47353 |
1999-12-27 | 37.77718 |
1999-12-28 | 43.99142 |
1999-12-29 | 51.50215 |
1999-12-30 | 56.68813 |
1999-12-31 | 55.30222 |
2000-01-03 | 51.50215 |
2000-01-04 | 47.56796 |
2000-01-05 | 44.61731 |
2000-01-06 | 42.91845 |
2000-01-07 | 46.49499 |
i'm trying to:
1. averaging the annual price, and then rearranging the columns and rows as:
Firmname | 1990 | 1991 | 1992 | … | 2019 |
Axis | 31.3 | 32.3 | 35.6 | … | 38.8 |
2. Merging all files in that folder to a sing file.
I have loaded them into SAS and was trying to get average of a single year by:
proc sql;
SELECT
AVG(price)
FROM
axis
WHERE
year(date) = 1990;
;
quit;
wish I could iterate through year 1990 to 2019 but seems that DO loop won't work in PROC SQL.
Use group by in sql:
proc sql;
select
year(date) as year,
avg(price)
from axis
group by calculated year
;
quit;
Hi @jimmychoi
Instead of making a per-company-analysis, you could also append all company datasets to one input dataset and avoid alle looping, like this:
/* Input data - 2 examples */
data axis;
firmname = 'axis'; date = '01jan1990'd; price = 1; output;
firmname = 'axis'; date = '01jan1991'd; price = 2; output;
firmname = 'axis'; date = '01jan1992'd; price = 3; output;
firmname = 'axis'; date = '01jan1993'd; price = 4; output;
run;
data beto;
firmname = 'beto'; date = '01jan1990'd; price = 6; output;
firmname = 'beto'; date = '01jan1991'd; price = 7; output;
firmname = 'beto'; date = '01jan1992'd; price = 8; output;
firmname = 'beto'; date = '01jan1993'd; price = 9; output;
run;
/* Append to a work data set */
data have; set axis beto;
run;
/* Calculate average price per company / year */
proc sql;
create table wdata as
select
firmname,
year(date) as year,
avg(price) as price
from have
group by firmname, calculated year
order by firmname;
quit;
/* Transpose to get years as variables */
proc transpose data=wdata out=want(drop=_NAME_);
by firmname;
id year;
run;
Hi ErikLund, your code works beautifully thanks.
one thing, i have used below code to aggregate separate files(example):
data someData;
length _xFilename $ 256;
infile "c:\temp\*.csv" filename=_xfilename;
xFilename = _xFilename;
input
@1 line $10.
;
run;
"DO LOOP" not permitted in PROC SQL?
Why would you use SQL instead of a DATA step?
data axis;
firmname = 'axis'; date = '01jan1990'd; price = 1; output;
firmname = 'axis'; date = '01jan1991'd; price = 2; output;
firmname = 'axis'; date = '01jan1992'd; price = 3; output;
firmname = 'axis'; date = '01jan1993'd; price = 4; output;
run;
data beto;
firmname = 'beto'; date = '01jan1990'd; price = 6; output;
firmname = 'beto'; date = '01jan1991'd; price = 7; output;
firmname = 'beto'; date = '01jan1992'd; price = 8; output;
firmname = 'beto'; date = '01jan1993'd; price = 9; output;
run;
/* Append to a work data set */
data have; set axis beto;
run;
proc tabulate data=have;
class date firmname;
format date year4.;
var price;
table firmname=' ',date=' '*price=' '*mean=' '/box='firmname';
run;
P.S. steal the dataset from Jenson .
If you want a table , then try other's code . my code is for a REPORT .
Is your goal a report that people read or a (poorly structured for many purposes) data set to feed into another process?
Also, are your dates actual SAS date values or some character value impersonating a date? SAS date values allow a lot of manipulation that character values are a pain to work with. For example changing a format allows creating groups for reports.
Below is an example of a report if the data files for the firms are combined and an appropriate format applied to a date-valued variable.
data example; informat date yymmdd10. price 12.5 firm $10.; format date yymmdd10.; input Date Price firm ; datalines; 1999-11-18 31.47353 Axis 1999-12-27 37.77718 Axis 1999-12-28 43.99142 Axis 1999-12-29 51.50215 Axis 1999-12-30 56.68813 Axis 1999-12-31 55.30222 Axis 2000-01-03 51.50215 Axis 2000-01-04 47.56796 Axis 2000-01-05 44.61731 Axis 2000-01-06 42.91845 Axis 2000-01-07 46.49499 Axis 1999-11-18 32.47353 Bravo 1999-12-27 33.77718 Bravo 1999-12-28 44.99142 Bravo 1999-12-29 51.50215 Bravo 1999-12-30 56.68813 Bravo 1999-12-31 55.30222 Bravo 2000-01-03 41.50215 Bravo 2000-01-04 43.56796 Bravo 2000-01-05 42.61731 Bravo 2000-01-06 52.91845 Bravo 2000-01-07 45.49499 Bravo ; run; proc tabulate data=example; class firm date; format date year4.; var price; table firm, date='Year'*price*(mean min max)*f=f12.5 ; run;
Changing the format from YEAR4 to YYQ is all that would be needed to create a calendar quarter summary for each firm; YYMON format would create monthly summaries.
I show other statistics to show those are available as well.
Once you have the data combined you could filter the report to specific intervals with a WHERE statement in Proc Tabulate.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.