BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bazrkar
Fluorite | Level 6

I have a time series of monthly precipitation data for more than 50 years. I want to calculate the CDF of each value in each month. After reading the data file, I useed SAS to calculate the CDF but it seems that I used the wrong code. Would you please let me know an example how can I have the series of CDF values for this period. 

In advance, thank you for you time. 

BTW, here is my code:

 

/*reading inputs*/
Data DATA;
infile 'C:\Documents\precipitation.dat' delimiter='09'x;
input date $ P;

run;

 

data DATA;
cdf('GAMMA', Q);
run;

1 ACCEPTED SOLUTION

Accepted Solutions
bazrkar
Fluorite | Level 6

Thank you for response. 

I got your point. 

Finally, I used Proc Severity to figure out which distribution fits to my data series. Proc severity provides the parameters for each distribution besides goodness of fit test. So, I used those parameters values for the fitted distribution and then, I calculate the associated CDF for each value of precipitation in each month. The following you can find the code that I used:

In this example, I calculate of the CDF of lognormal with mu and sigma values estimated by Proc Severity. 

 

Data DATA;
infile 'C:\Precipitation.dat' delimiter='09'x;
input date $ P;
y=cdf('LOGNORMAL',P,3.33684,1.23572);
run;


Proc print;

run;

Thanks Reeza and BallardW

View solution in original post

13 REPLIES 13
Reeza
Super User

Sounds like you just want a running total each month. That's pretty straightforward in a data step but you can also use PROC UNIVARIATE. 

 

First you'll need to convert your date to a SAS date - I don't exactly what format your date is in so you'll need to fix that portion of the code.

 

*Create sample data to test;
data have;
set sashelp.stocks (rename = date = date_char);

*you'll need to do this to convert to a SAS date;
*you'll need to change date9 to be appropriate for your data;
date = input(date_char, date9.); 
keep open date;

run;

proc sort data=have; 
by date;
run;

data want;
set have;
by date groupformat;
format date yymon5.; *format date by year month so analysis is by year month;

*set to 0 at start of each month;
if first.date then cum_total=0;

*add up for each month;
cum_total + open;

run;

@bazrkar wrote:

I have a time series of monthly precipitation data for more than 50 years. I want to calculate the CDF of each value in each month. After reading the data file, I useed SAS to calculate the CDF but it seems that I used the wrong code. Would you please let me know an example how can I have the series of CDF values for this period. 

In advance, thank you for you time. 

BTW, here is my code:

 

/*reading inputs*/
Data DATA;
infile 'C:\Documents\precipitation.dat' delimiter='09'x;
input date $ P;

run;

 

data DATA;
cdf('GAMMA', Q);
run;


 

bazrkar
Fluorite | Level 6

Thank you for your answer Reeza. 

 

ballardw
Super User

You would need to 1) reference your imported data set and 2) assign the result of the cdf function to a variable:

data DATA2;
   set data;
   result=cdf('GAMMA', Q);
run;

Some basics: SET is one way to reference an existing data set for use by another data step. If you use the same name on the Data statement as on the Set statement then you potentially can destroy the data set. Strongly recommend using a different data set name for the output.

 

You can do manipulation in the same data step that your read data:

Data DATA;
   infile 'C:\Documents\precipitation.dat' delimiter='09'x;
   input date $ P;
   result= cdf('gamma',P);
run;

I used P because that appears to be the only other variable than date.

I suggest that you show us what layout the date value has so a better format than character can be used to read it. If you read the date as SAS date value then there a many things that can be done with values that generally are not possible, or at best cumbersome, with character values.

 

If the data isn't sensitive you might want to post a few lines of the input text file for recommendations. I suspect that you may be looking to summarize your data before discussing the CDF of your data.

bazrkar
Fluorite | Level 6

Thank You Ballardw.

I still have one more problem. 

This is the error, when I used

result= cdf('gamma',P);

The Function CDF with GAMMA as its first argument
needs at least 1 additional argument.

 

Does it mean that I have to necessarily add scale and shape parameters to get the answer. I thought these parameters are optional and SAS should be able to calculate them based on the the input data. Am I right? 

Reeza
Super User
BallardW and I are interpreting 'CDF' differently. I suspect you just want a running total - a graph showing the cumulative total or observed cumulative distribution. If you'd like to fit a distribution to it then you should be using PROC UNIVARIATE and it will estimate the parameters. But I think you need to clarify your details. It would also help if you could show an example of what you have and what you want.
bazrkar
Fluorite | Level 6

Sorry for not being clear.

To clarify my question, first, I want to do the goodness of fit to figure out which distribution fit to my time series. I have already used proc severity and there is no problem. Using Proc Severity, I have the CDF plot but I want the values for CDF for each precipitation value in each month since I want to do more process on the CDF values. 

Thanks. 

 

ballardw
Super User

@bazrkar wrote:

Sorry for not being clear.

To clarify my question, first, I want to do the goodness of fit to figure out which distribution fit to my time series. I have already used proc severity and there is no problem. Using Proc Severity, I have the CDF plot but I want the values for CDF for each precipitation value in each month since I want to do more process on the CDF values. 

Thanks. 

 


Reiterate: Show some actual example data.

 

Okay, what do you mean by CDF?

Mathematically CDF is "Cumulative Distribution Function". Which is a function of the range such that the result of the low end of the domain has 0 as a value and 1 and the maximum value of the domain. The cdf curve of a normal distribution looks like the below graph. The CDF function in SAS returns the Y value associated with the shown X.

SGPlot8.png

bazrkar
Fluorite | Level 6

Yes, I mean Cumulative Distribution Function.

Based on the graph that you provided, I want the y values (CDF) of different values of x (precipitation) for different distribution.

Attached please find my input data.

Thanks.

Reeza
Super User
1. Are you looking for an empirical CDF applied to all of your data and then determining the cumulative probability for a specific month value? Does years factor in at any point? Does it matter that your data starts in October?
2. Are you looking to fit a distribution to your data and then using those parameters determine the CDF?

One is parameteric and the other is non parameteric.

If you want the first all you need is PROC RANK.


proc rank data=have out=want;
var p;
ranks p_rank;
run;
bazrkar
Fluorite | Level 6

Thanks Reeza!

Using goodness of fit I want to find which distribution does fit to my data. Then, I want to use that distribution to calculate the CDF values. Yes, my data should start from October since the water year starts from October and ends on September.

bazrkar
Fluorite | Level 6
Years does not matter. Months and years are just labels.
Reeza
Super User

Start here to fit a distribution then:

https://blogs.sas.com/content/iml/2011/10/28/modeling-the-distribution-of-data-create-a-qq-plot.html

 

And then once you have the parameters you can use the CDF function with the appropriate values and distribution.


@bazrkar wrote:

Thanks Reeza!

Using goodness of fit I want to find which distribution does fit to my data. Then, I want to use that distribution to calculate the CDF values. Yes, my data should start from October since the water year starts from October and ends on September.


 

bazrkar
Fluorite | Level 6

Thank you for response. 

I got your point. 

Finally, I used Proc Severity to figure out which distribution fits to my data series. Proc severity provides the parameters for each distribution besides goodness of fit test. So, I used those parameters values for the fitted distribution and then, I calculate the associated CDF for each value of precipitation in each month. The following you can find the code that I used:

In this example, I calculate of the CDF of lognormal with mu and sigma values estimated by Proc Severity. 

 

Data DATA;
infile 'C:\Precipitation.dat' delimiter='09'x;
input date $ P;
y=cdf('LOGNORMAL',P,3.33684,1.23572);
run;


Proc print;

run;

Thanks Reeza and BallardW

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 4921 views
  • 2 likes
  • 3 in conversation