Solved
Contributor
Posts: 45

# Get the empirical CDF

Hi everyone!

I have a dataset with a column 1 of stock return numbers, with each row representing a company. Now I'm supposed to calculate a new column 2, being the empirical cumulative distribution function (CDF) of column 1. I don't necessarily need the function itself, I just need to get the density number for each company. So basically, all the numbers in column 1 is supposed to follow this CDF, and each company has a different density number, based on its column 1 return.

I hope I've made myself clear...

I googled about it and found a "severity" procedure. Maybe my SAS system is too old but it says " Procedure SEVERITY not found." I wonder if there's a simple solution to my problem.

Many thanks!

Accepted Solutions
Solution
‎04-15-2013 02:10 AM
Contributor
Posts: 45

## Re: Get the empirical CDF

This is what I did and finally worked:

data want;

set have;

retain running_total 0;

running_total+column1;

ecdf=running_total/overaltotal;

run;

---------------------------------------------------------------

Rick you are right. I did first sort the data by column1 and I had checked that there was no missing data. So what it is is:

proc sort data=have;

by column1;

run;

data want;

set have;

retain running_total 0;

running_total+column1;

ecdf=running_total/overaltotal;

run;

All Replies
Super User
Posts: 20,735

## Re: Get the empirical CDF

I think that means just sorting it and then dividing by the n for each order.

E.G.

Proc sort data=have; by column1; run;

data want;

set have;

ecdf=_n_/number_of_companies;

run;

If I'm way off post then please post some more details.

Proc Severity is part of ETS package and you may not have that licensed.

Contributor
Posts: 45

## Re: Get the empirical CDF

Hi Reeza,

Thanks for the reply. I think your way would imply that the returns follow a uniform distribution, hence the difference between different ecdf's equals 1, which isn't really the case for my column 1.

My column 1 looks something like this:

1     5940285.422

2     15182646.036

3     34539.400618

4     4184974.0126

5     3824416.2707

........

Super User
Posts: 20,735

## Re: Get the empirical CDF

That is the definition of ECDF as far as I know it, and wikipedia:

http://en.wikipedia.org/wiki/Empirical_distribution_function

If you're looking for the distribution of the returns you can sum returns, still sort and then divide each return by the total return.

Contributor
Posts: 45

## Re: Get the empirical CDF

This seems to make a lot of sense. Thank you!

Contributor
Posts: 45

## Re: Get the empirical CDF

BTW, Reeza, the ecdf is supposed to range from 0 to 1. But with the method above, my biggest ecdf is smaller than 0.3, and the smallest almost zero. It seems to me it needs some kind of standardization. Any ideas?

Contributor
Posts: 45

## Re: Get the empirical CDF

I think I'll divide the ecdf with total return and then times my biggest return number, that way it's standardized.

Super User
Posts: 20,735

## Re: Get the empirical CDF

I forgot about the cumulative part, you need to divide the running total by the total.

data want;

set have;

retain running_total;

running_total=running_total+column1;

ecdf=running_total/overaltotal;

run;

Solution
‎04-15-2013 02:10 AM
Contributor
Posts: 45

## Re: Get the empirical CDF

This is what I did and finally worked:

data want;

set have;

retain running_total 0;

running_total+column1;

ecdf=running_total/overaltotal;

run;

---------------------------------------------------------------

Rick you are right. I did first sort the data by column1 and I had checked that there was no missing data. So what it is is:

proc sort data=have;

by column1;

run;

data want;

set have;

retain running_total 0;

running_total+column1;

ecdf=running_total/overaltotal;

run;

SAS Super FREQ
Posts: 3,839

## Re: Get the empirical CDF

Your program computes the cumulative proportion of total values.  If that's what you want, then fine. But the cumulative proportion is not equivalent to the ECDF unless the original data are sorted and nonmissing. So make sure you remember to sort and remove missings.

SAS Super FREQ
Posts: 3,839

## Re: Get the empirical CDF

PROC SEVERITY is in the SAS/ETS product, but you can use PROC UNIVARIATE to get this automatically.

proc univariate data=sashelp.class noprint;

var weight;

cdfplot weight;

ods output CDFPlot=ECDF;

run;

The data set ECDF contains two columns, ECDFX and ECDFY, that contain the empirical CDF.

By the way, the DATA step code works provided that all the data are nonmissing, but it should be adjusted to handle missing values.

Super User
Posts: 20,735

## Re: Get the empirical CDF

Rick is this an 9.3 option? I'm trying it in 9.2 with a class variable but I can't seem to figure out what the table name is.

-------------

Name:       UNIVAR12

Label:      CDF Plot 1

Data Name:  GRSEG

Path:       Univariate.time_to_pay.CDFPlot.UNIVAR12

-------------

NOTE: PROCEDURE UNIVARIATE used (Total process time):

real time           2.70 seconds

cpu time            0.71 seconds

-------------

Name:       UNIVAR13

Label:      CDF Plot 1

Data Name:  GRSEG

Path:       Univariate.time_to_pay.CDFPlot.UNIVAR13

-------------

WARNING: Output 'CDFPlot' was not created.  Make sure that the output

object name, label, or path is spelled correctly.  Also, verify

that the appropriate procedure options are used to produce the

requested output object.  For example, verify that the NOPRINT

option is not used.

SAS Super FREQ
Posts: 3,839

## Re: Get the empirical CDF

It looks like you don't have ODS graphics turned on?

ODS graphics on;

Super User
Posts: 20,735

## Re: Get the empirical CDF

Yup, that was it, thanks

🔒 This topic is solved and locked.

Discussion stats
• 13 replies
• 3212 views
• 3 likes
• 3 in conversation