- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone!
I have a dataset with a column 1 of stock return numbers, with each row representing a company. Now I'm supposed to calculate a new column 2, being the empirical cumulative distribution function (CDF) of column 1. I don't necessarily need the function itself, I just need to get the density number for each company. So basically, all the numbers in column 1 is supposed to follow this CDF, and each company has a different density number, based on its column 1 return.
I hope I've made myself clear...
I googled about it and found a "severity" procedure. Maybe my SAS system is too old but it says " Procedure SEVERITY not found." I wonder if there's a simple solution to my problem.
Many thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is what I did and finally worked:
data want;
set have;
retain running_total 0;
running_total+column1;
ecdf=running_total/overaltotal;
run;
---------------------------------------------------------------
Rick you are right. I did first sort the data by column1 and I had checked that there was no missing data. So what it is is:
proc sort data=have;
by column1;
run;
data want;
set have;
retain running_total 0;
running_total+column1;
ecdf=running_total/overaltotal;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think that means just sorting it and then dividing by the n for each order.
E.G.
Proc sort data=have; by column1; run;
data want;
set have;
ecdf=_n_/number_of_companies;
run;
If I'm way off post then please post some more details.
Proc Severity is part of ETS package and you may not have that licensed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reeza,
Thanks for the reply. I think your way would imply that the returns follow a uniform distribution, hence the difference between different ecdf's equals 1, which isn't really the case for my column 1.
My column 1 looks something like this:
1 5940285.422
2 15182646.036
3 34539.400618
4 4184974.0126
5 3824416.2707
........
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That is the definition of ECDF as far as I know it, and wikipedia:
http://en.wikipedia.org/wiki/Empirical_distribution_function
If you're looking for the distribution of the returns you can sum returns, still sort and then divide each return by the total return.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This seems to make a lot of sense. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
BTW, Reeza, the ecdf is supposed to range from 0 to 1. But with the method above, my biggest ecdf is smaller than 0.3, and the smallest almost zero. It seems to me it needs some kind of standardization. Any ideas?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think I'll divide the ecdf with total return and then times my biggest return number, that way it's standardized.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I forgot about the cumulative part, you need to divide the running total by the total.
data want;
set have;
retain running_total;
running_total=running_total+column1;
ecdf=running_total/overaltotal;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is what I did and finally worked:
data want;
set have;
retain running_total 0;
running_total+column1;
ecdf=running_total/overaltotal;
run;
---------------------------------------------------------------
Rick you are right. I did first sort the data by column1 and I had checked that there was no missing data. So what it is is:
proc sort data=have;
by column1;
run;
data want;
set have;
retain running_total 0;
running_total+column1;
ecdf=running_total/overaltotal;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your program computes the cumulative proportion of total values. If that's what you want, then fine. But the cumulative proportion is not equivalent to the ECDF unless the original data are sorted and nonmissing. So make sure you remember to sort and remove missings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PROC SEVERITY is in the SAS/ETS product, but you can use PROC UNIVARIATE to get this automatically.
proc univariate data=sashelp.class noprint;
var weight;
cdfplot weight;
ods output CDFPlot=ECDF;
run;
The data set ECDF contains two columns, ECDFX and ECDFY, that contain the empirical CDF.
By the way, the DATA step code works provided that all the data are nonmissing, but it should be adjusted to handle missing values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Rick is this an 9.3 option? I'm trying it in 9.2 with a class variable but I can't seem to figure out what the table name is.
Output Added:
-------------
Name: UNIVAR12
Label: CDF Plot 1
Data Name: GRSEG
Path: Univariate.time_to_pay.CDFPlot.UNIVAR12
-------------
NOTE: PROCEDURE UNIVARIATE used (Total process time):
real time 2.70 seconds
cpu time 0.71 seconds
Output Added:
-------------
Name: UNIVAR13
Label: CDF Plot 1
Data Name: GRSEG
Path: Univariate.time_to_pay.CDFPlot.UNIVAR13
-------------
WARNING: Output 'CDFPlot' was not created. Make sure that the output
object name, label, or path is spelled correctly. Also, verify
that the appropriate procedure options are used to produce the
requested output object. For example, verify that the NOPRINT
option is not used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It looks like you don't have ODS graphics turned on?
ODS graphics on;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yup, that was it, thanks