BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JOLSAS
Quartz | Level 8

Hi everyone!

I have a dataset with a column 1 of stock return numbers, with each row representing a company. Now I'm supposed to calculate a new column 2, being the empirical cumulative distribution function (CDF) of column 1. I don't necessarily need the function itself, I just need to get the density number for each company. So basically, all the numbers in column 1 is supposed to follow this CDF, and each company has a different density number, based on its column 1 return.

I hope I've made myself clear...

I googled about it and found a "severity" procedure. Maybe my SAS system is too old but it says " Procedure SEVERITY not found." I wonder if there's a simple solution to my problem.

Many thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
JOLSAS
Quartz | Level 8

This is what I did and finally worked:

data want;

set have;

retain running_total 0;

running_total+column1;

ecdf=running_total/overaltotal;

run;

---------------------------------------------------------------

Rick you are right. I did first sort the data by column1 and I had checked that there was no missing data. So what it is is:

proc sort data=have;

     by column1;

run;

data want;

set have;

     retain running_total 0;

     running_total+column1;

     ecdf=running_total/overaltotal;

run;

View solution in original post

13 REPLIES 13
Reeza
Super User

I think that means just sorting it and then dividing by the n for each order.

E.G.

Proc sort data=have; by column1; run;

data want;

set have;

ecdf=_n_/number_of_companies;

run;

If I'm way off post then please post some more details.

Proc Severity is part of ETS package and you may not have that licensed.

JOLSAS
Quartz | Level 8

Hi Reeza,

Thanks for the reply. I think your way would imply that the returns follow a uniform distribution, hence the difference between different ecdf's equals 1, which isn't really the case for my column 1.

My column 1 looks something like this:

1     5940285.422

2     15182646.036

3     34539.400618

4     4184974.0126

5     3824416.2707

........

Reeza
Super User

That is the definition of ECDF as far as I know it, and wikipedia:

http://en.wikipedia.org/wiki/Empirical_distribution_function

If you're looking for the distribution of the returns you can sum returns, still sort and then divide each return by the total return. 

JOLSAS
Quartz | Level 8

This seems to make a lot of sense. Thank you!

JOLSAS
Quartz | Level 8

BTW, Reeza, the ecdf is supposed to range from 0 to 1. But with the method above, my biggest ecdf is smaller than 0.3, and the smallest almost zero. It seems to me it needs some kind of standardization. Any ideas?

JOLSAS
Quartz | Level 8

I think I'll divide the ecdf with total return and then times my biggest return number, that way it's standardized.

Reeza
Super User

I forgot about the cumulative part, you need to divide the running total by the total.

data want;

set have;

retain running_total;

running_total=running_total+column1;

ecdf=running_total/overaltotal;

run;

JOLSAS
Quartz | Level 8

This is what I did and finally worked:

data want;

set have;

retain running_total 0;

running_total+column1;

ecdf=running_total/overaltotal;

run;

---------------------------------------------------------------

Rick you are right. I did first sort the data by column1 and I had checked that there was no missing data. So what it is is:

proc sort data=have;

     by column1;

run;

data want;

set have;

     retain running_total 0;

     running_total+column1;

     ecdf=running_total/overaltotal;

run;

Rick_SAS
SAS Super FREQ

Your program computes the cumulative proportion of total values.  If that's what you want, then fine. But the cumulative proportion is not equivalent to the ECDF unless the original data are sorted and nonmissing. So make sure you remember to sort and remove missings.

Rick_SAS
SAS Super FREQ

PROC SEVERITY is in the SAS/ETS product, but you can use PROC UNIVARIATE to get this automatically.

proc univariate data=sashelp.class noprint;

var weight;

cdfplot weight;

ods output CDFPlot=ECDF;

run;

The data set ECDF contains two columns, ECDFX and ECDFY, that contain the empirical CDF.

By the way, the DATA step code works provided that all the data are nonmissing, but it should be adjusted to handle missing values.

Reeza
Super User

Rick is this an 9.3 option? I'm trying it in 9.2 with a class variable but I can't seem to figure out what the table name is.

Output Added:

-------------

Name:       UNIVAR12

Label:      CDF Plot 1

Data Name:  GRSEG

Path:       Univariate.time_to_pay.CDFPlot.UNIVAR12

-------------

NOTE: PROCEDURE UNIVARIATE used (Total process time):

      real time           2.70 seconds

      cpu time            0.71 seconds

Output Added:

-------------

Name:       UNIVAR13

Label:      CDF Plot 1

Data Name:  GRSEG

Path:       Univariate.time_to_pay.CDFPlot.UNIVAR13

-------------

WARNING: Output 'CDFPlot' was not created.  Make sure that the output

         object name, label, or path is spelled correctly.  Also, verify

         that the appropriate procedure options are used to produce the

         requested output object.  For example, verify that the NOPRINT

         option is not used.

Rick_SAS
SAS Super FREQ

It looks like you don't have ODS graphics turned on?

ODS graphics on;

Reeza
Super User

Yup, that was it, thanks Smiley Happy

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 8795 views
  • 3 likes
  • 3 in conversation