Solved: Re: Get the empirical CDF

JOLSAS · Posted 04-09-2013 10:12 PM

Hi everyone!

I have a dataset with a column 1 of stock return numbers, with each row representing a company. Now I'm supposed to calculate a new column 2, being the empirical cumulative distribution function (CDF) of column 1. I don't necessarily need the function itself, I just need to get the density number for each company. So basically, all the numbers in column 1 is supposed to follow this CDF, and each company has a different density number, based on its column 1 return.

I hope I've made myself clear...

I googled about it and found a "severity" procedure. Maybe my SAS system is too old but it says " Procedure SEVERITY not found." I wonder if there's a simple solution to my problem.

Many thanks!

JOLSAS · Posted 04-15-2013 02:10 AM

This is what I did and finally worked:

data want;

set have;

retain running_total 0;

running_total+column1;

ecdf=running_total/overaltotal;

run;

---------------------------------------------------------------

Rick you are right. I did first sort the data by column1 and I had checked that there was no missing data. So what it is is:

proc sort data=have;

by column1;

run;

data want;

set have;

retain running_total 0;

running_total+column1;

ecdf=running_total/overaltotal;

run;

View solution in original post

Reeza · Posted 04-09-2013 10:42 PM

I think that means just sorting it and then dividing by the n for each order.

E.G.

Proc sort data=have; by column1; run;

data want;

set have;

ecdf=_n_/number_of_companies;

run;

If I'm way off post then please post some more details.

Proc Severity is part of ETS package and you may not have that licensed.

JOLSAS · Posted 04-09-2013 10:55 PM

Hi Reeza,

Thanks for the reply. I think your way would imply that the returns follow a uniform distribution, hence the difference between different ecdf's equals 1, which isn't really the case for my column 1.

My column 1 looks something like this:

1 5940285.422

2 15182646.036

3 34539.400618

4 4184974.0126

5 3824416.2707

........

Reeza · Posted 04-09-2013 11:00 PM

That is the definition of ECDF as far as I know it, and wikipedia:

http://en.wikipedia.org/wiki/Empirical_distribution_function

If you're looking for the distribution of the returns you can sum returns, still sort and then divide each return by the total return.

JOLSAS · Posted 04-09-2013 11:13 PM

This seems to make a lot of sense. Thank you!

JOLSAS · Posted 04-09-2013 11:22 PM

BTW, Reeza, the ecdf is supposed to range from 0 to 1. But with the method above, my biggest ecdf is smaller than 0.3, and the smallest almost zero. It seems to me it needs some kind of standardization. Any ideas?

JOLSAS · Posted 04-09-2013 11:24 PM

I think I'll divide the ecdf with total return and then times my biggest return number, that way it's standardized.

Reeza · Posted 04-10-2013 12:02 AM

I forgot about the cumulative part, you need to divide the running total by the total.

data want;

set have;

retain running_total;

running_total=running_total+column1;

ecdf=running_total/overaltotal;

run;

JOLSAS · Posted 04-15-2013 02:10 AM

This is what I did and finally worked:

data want;

set have;

retain running_total 0;

running_total+column1;

ecdf=running_total/overaltotal;

run;

---------------------------------------------------------------

Rick you are right. I did first sort the data by column1 and I had checked that there was no missing data. So what it is is:

proc sort data=have;

by column1;

run;

data want;

set have;

retain running_total 0;

running_total+column1;

ecdf=running_total/overaltotal;

run;

Rick_SAS · Posted 04-18-2013 10:15 AM

Your program computes the cumulative proportion of total values. If that's what you want, then fine. But the cumulative proportion is not equivalent to the ECDF unless the original data are sorted and nonmissing. So make sure you remember to sort and remove missings.

Rick_SAS · Posted 04-10-2013 06:05 AM

PROC SEVERITY is in the SAS/ETS product, but you can use PROC UNIVARIATE to get this automatically.

proc univariate data=sashelp.class noprint;

var weight;

cdfplot weight;

ods output CDFPlot=ECDF;

run;

The data set ECDF contains two columns, ECDFX and ECDFY, that contain the empirical CDF.

By the way, the DATA step code works provided that all the data are nonmissing, but it should be adjusted to handle missing values.

Reeza · Posted 04-24-2013 11:43 AM

Rick is this an 9.3 option? I'm trying it in 9.2 with a class variable but I can't seem to figure out what the table name is.

Output Added:

-------------

Name: UNIVAR12

Label: CDF Plot 1

Data Name: GRSEG

Path: Univariate.time_to_pay.CDFPlot.UNIVAR12

-------------

NOTE: PROCEDURE UNIVARIATE used (Total process time):

real time 2.70 seconds

cpu time 0.71 seconds

Output Added:

-------------

Name: UNIVAR13

Label: CDF Plot 1

Data Name: GRSEG

Path: Univariate.time_to_pay.CDFPlot.UNIVAR13

-------------

WARNING: Output 'CDFPlot' was not created. Make sure that the output

object name, label, or path is spelled correctly. Also, verify

that the appropriate procedure options are used to produce the

requested output object. For example, verify that the NOPRINT

option is not used.

Rick_SAS · Posted 04-24-2013 12:34 PM

It looks like you don't have ODS graphics turned on?

ODS graphics on;

Reeza · Posted 04-24-2013 02:14 PM

Yup, that was it, thanks

Catch up on SAS Innovate 2026