never used SAS before, but have a project due next week where I have to use this software.
I downloaded around 200,000 data observations. As the observations are time-series, I need to deflate them using a price index. How can I do this best? I downloaded the data for the price index in a separate file. I am only used to Excel, where I just multiply a cell with the price index, but I assume I need a command here?
It would help to know a bit more about the data. For example, let's say you have one file...which is the big file of 200,000 observations. Is the time series data like this:
Month Year Amount Item
1 1990 1000 AAA
2 1990 1100 AAA
3 1990 1000 BBB
4 1990 1100 CCC
5 1990 1000 DDD
6 1990 1100 DDD
7 1990 1000 EEE
8 1990 1100 FFF
9 1990 1000 GGG
10 1990 1100 GGG
11 1990 1000 HHH
12 1990 1100 HHH
OR like this:
And how is your price data structured??? You have a month and year and the price index or you have only a year and the price index????
Or, is the price index a constant value -- the same for all years???? or the same for all months in a year??
Other considerations -- is your data already in SAS dataset form or do you have to read the data into SAS form?
Also helpful, would be to know how you are going to access SAS. Are you going to work in interactive mode (using SAS Display Manager) or in "batch" mode, using SAS on a mainframe computer or on a server computer?
Maybe you have a copy of the SAS Learning Edition? Or, you are using SAS at work?? Did anyone give you instructions for how to open SAS or how to submit programs? Will you be using SAS Enterprise Guide??
For example, if you were using SAS Enterprise Guide or the SAS Learning Edition (which uses EG as the front end), then you would want to use EG menus in order to read the data into SAS and perform queries to manipulate the data and create new variables for the adjusted amount. But, if you were using SAS Display Manager or using SAS in "batch" mode (as on the mainframe), you would probably need to write a DATA step program or programs to accomplish your task.
Is there anyone who can help you get started with SAS?? It would take an enormously long, long post to tell you how to open SAS, how to read data, how to manipulate data and how to create new variables. People might be able to give you some ideas about how to proceed if they had more information or knew what resources were available to you.
thanks for your help. My data set consists of 20yr+ of company data (balance sheet items (all annual data). I downloaded the annual consumer price index as well. This one is in a separate data set. Question: How can I merge the two files and make sure that the company data for any given year is deflated properly?
Another question: I also calculated ratios of the data which are shown in new, separate columns. For each ratio, I want to delete the 3% highest and lowest observations as these could be outliers. So I start with one ratio, delete the outliers, move on to the next ratio, delete the outliers, etc. How can this be done best?
CPI would lend itself to a look-up based on the "prevailing index value" at a date.
That way you don't need a date match between annual account data and cpi-data.
I assume an index value at date X applies for annual accounts dated between date-at(X) and date-at(X-1). Proc format allows us to create a look-up file of this nature.
Additionally, you have more than one ratio. Are these for different values or as at different dates?
Here is some code to build a demo look-up file[pre]data c_price_data ;
infile cards dsd;
input date index ;
informat date date9. ;
data cntlin ;
retain fmtname 'cpi' hlo 'L' start end ;
set c_price_data( keep= date index ) end= eof ;
end = date ;
label = index ;
hlo = ' ' ;
start = end ;
if eof ;
hlo = 'H' ;
proc format cntlin= cntlin fmtlib ;
%put demo %sysfunc( putn( "1jan1990"d, cpi ));
%put demo %sysfunc( putn( "1Mar2000"d, cpi ));
%put demo %sysfunc( putn( "1Aug2008"d, cpi ));
%put demo %sysfunc( putn( "1Aug2010"d, cpi ));[/pre]
One wrinkle - the put() function creates character values, not numerics. So the formula to collect the index value and apply it, involves a conversion of the character version of the index to a number. Something like[pre] data deflated ;
set raw_returns( keep= companyID return_date balance1-balance8 ) ;[/pre]* assuming the company information is stored in variables like balance1 ... balance8 .. .. .. adapt to suit ;[pre] cpi = input( put( return_date, cpi11. ), best11. ) ;[/pre]* with my demo index values, we need also to divide by 100 ;[pre] cpi= cpi/100 ;
array balances(*) balance1-balance8 ;
do idx= 1 to dim(balances) ;
if balances(idx) then balances(idx)= balances(idx) / cpi ;
For 3% percentile boundaries, I think you'll find proc univariate most convenient, as it allows you to specifiy whatever percentiles you want.