Hi,
I have bunch of files each having daily stock price of a company.
say, for a company named "AXIS" (in file axis.csv),
Date | Price |
1999-11-18 | 31.47353 |
1999-12-27 | 37.77718 |
1999-12-28 | 43.99142 |
1999-12-29 | 51.50215 |
1999-12-30 | 56.68813 |
1999-12-31 | 55.30222 |
2000-01-03 | 51.50215 |
2000-01-04 | 47.56796 |
2000-01-05 | 44.61731 |
2000-01-06 | 42.91845 |
2000-01-07 | 46.49499 |
i'm trying to:
1. averaging the annual price, and then rearranging the columns and rows as:
Firmname | 1990 | 1991 | 1992 | … | 2019 |
Axis | 31.3 | 32.3 | 35.6 | … | 38.8 |
2. Merging all files in that folder to a sing file.
I have loaded them into SAS and was trying to get average of a single year by:
proc sql;
SELECT
AVG(price)
FROM
axis
WHERE
year(date) = 1990;
;
quit;
wish I could iterate through year 1990 to 2019 but seems that DO loop won't work in PROC SQL.
Use group by in sql:
proc sql;
select
year(date) as year,
avg(price)
from axis
group by calculated year
;
quit;
Hi @jimmychoi
Instead of making a per-company-analysis, you could also append all company datasets to one input dataset and avoid alle looping, like this:
/* Input data - 2 examples */
data axis;
firmname = 'axis'; date = '01jan1990'd; price = 1; output;
firmname = 'axis'; date = '01jan1991'd; price = 2; output;
firmname = 'axis'; date = '01jan1992'd; price = 3; output;
firmname = 'axis'; date = '01jan1993'd; price = 4; output;
run;
data beto;
firmname = 'beto'; date = '01jan1990'd; price = 6; output;
firmname = 'beto'; date = '01jan1991'd; price = 7; output;
firmname = 'beto'; date = '01jan1992'd; price = 8; output;
firmname = 'beto'; date = '01jan1993'd; price = 9; output;
run;
/* Append to a work data set */
data have; set axis beto;
run;
/* Calculate average price per company / year */
proc sql;
create table wdata as
select
firmname,
year(date) as year,
avg(price) as price
from have
group by firmname, calculated year
order by firmname;
quit;
/* Transpose to get years as variables */
proc transpose data=wdata out=want(drop=_NAME_);
by firmname;
id year;
run;
Hi ErikLund, your code works beautifully thanks.
one thing, i have used below code to aggregate separate files(example):
data someData;
length _xFilename $ 256;
infile "c:\temp\*.csv" filename=_xfilename;
xFilename = _xFilename;
input
@1 line $10.
;
run;
"DO LOOP" not permitted in PROC SQL?
Why would you use SQL instead of a DATA step?
data axis;
firmname = 'axis'; date = '01jan1990'd; price = 1; output;
firmname = 'axis'; date = '01jan1991'd; price = 2; output;
firmname = 'axis'; date = '01jan1992'd; price = 3; output;
firmname = 'axis'; date = '01jan1993'd; price = 4; output;
run;
data beto;
firmname = 'beto'; date = '01jan1990'd; price = 6; output;
firmname = 'beto'; date = '01jan1991'd; price = 7; output;
firmname = 'beto'; date = '01jan1992'd; price = 8; output;
firmname = 'beto'; date = '01jan1993'd; price = 9; output;
run;
/* Append to a work data set */
data have; set axis beto;
run;
proc tabulate data=have;
class date firmname;
format date year4.;
var price;
table firmname=' ',date=' '*price=' '*mean=' '/box='firmname';
run;
P.S. steal the dataset from Jenson .
If you want a table , then try other's code . my code is for a REPORT .
Is your goal a report that people read or a (poorly structured for many purposes) data set to feed into another process?
Also, are your dates actual SAS date values or some character value impersonating a date? SAS date values allow a lot of manipulation that character values are a pain to work with. For example changing a format allows creating groups for reports.
Below is an example of a report if the data files for the firms are combined and an appropriate format applied to a date-valued variable.
data example; informat date yymmdd10. price 12.5 firm $10.; format date yymmdd10.; input Date Price firm ; datalines; 1999-11-18 31.47353 Axis 1999-12-27 37.77718 Axis 1999-12-28 43.99142 Axis 1999-12-29 51.50215 Axis 1999-12-30 56.68813 Axis 1999-12-31 55.30222 Axis 2000-01-03 51.50215 Axis 2000-01-04 47.56796 Axis 2000-01-05 44.61731 Axis 2000-01-06 42.91845 Axis 2000-01-07 46.49499 Axis 1999-11-18 32.47353 Bravo 1999-12-27 33.77718 Bravo 1999-12-28 44.99142 Bravo 1999-12-29 51.50215 Bravo 1999-12-30 56.68813 Bravo 1999-12-31 55.30222 Bravo 2000-01-03 41.50215 Bravo 2000-01-04 43.56796 Bravo 2000-01-05 42.61731 Bravo 2000-01-06 52.91845 Bravo 2000-01-07 45.49499 Bravo ; run; proc tabulate data=example; class firm date; format date year4.; var price; table firm, date='Year'*price*(mean min max)*f=f12.5 ; run;
Changing the format from YEAR4 to YYQ is all that would be needed to create a calendar quarter summary for each firm; YYMON format would create monthly summaries.
I show other statistics to show those are available as well.
Once you have the data combined you could filter the report to specific intervals with a WHERE statement in Proc Tabulate.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.