Solved: Re: SAS_StateSpaceModel

Khaladdin · Posted 11-01-2017 10:39 AM

Hi all,

I want to ask a question related State Space procedure. I have a huge dataset that contains a million groups. I need to find permanent and transitory components of each group by using State Space Model. I run the following code:

proc ucm data=work;
	model price;
	by group;
	irregular plot=smooth;
	level checkbreak plot=smooth;
	estimate plot=residual;
	forecast plot=forecasts lead=10 alpha=0.5;
run;

This code works well. I have just one issue. As I have a huge number of groups, it takes a lot of time (approximately 3 months). Do you know any way/method that I can use to increase the efficiency and reduce the time.

Thanks in advance for your helps.

Rick_SAS · Posted 11-01-2017 01:25 PM

If you have millions of BY groups, I question whether you need all those plots.How are you going to view 4 million plots?

If you don't need printed output for certain sections, suppress it. For example, use PRINT=NONE on the ESTIMATE statement or use the ODS EXCLUDE statement to suppress output.

My advice is to (1) Use NOPRINT to turn off printing; (2) get rid of the plots, and (3) use OUTEST= and OUTFOR= options to send the results to data sets.

Is this real data or a simulation? If a simulation, see Tips #4 through #8 in the article "Eight tips to make your simulation run faster."

View solution in original post

rselukar · Posted 11-01-2017 11:27 AM

The only thing I can think of is to distribute the problem on several machines (each machine gets a different set of by-groups).

Khaladdin · Posted 11-01-2017 04:09 AM

Hi all,

I want to ask a question related State Space procedure. I have a huge dataset that contains a million groups. I need to find permanent and transitory components of each group by using State Space Model. I run the following code:

proc ucm data=work;
	model price;
	by group;
	irregular plot=smooth;
	level checkbreak plot=smooth;
	estimate plot=residual;
	forecast plot=forecasts lead=10 alpha=0.5;
run;

This code works well. I have just one issue. As I have a huge number of groups, it takes a lot of time (approximately 3 months). Do you know any way/method that I can use to increase the efficiency and reduce the time.

Thanks in advance for your helps.

Ksharp · Posted 11-01-2017 09:23 AM

It is a time series analysis question.
Please post it at Forecast forum.

Khaladdin · Posted 11-01-2017 10:40 AM

Thanks

Rick_SAS · Posted 11-01-2017 01:25 PM

If you have millions of BY groups, I question whether you need all those plots.How are you going to view 4 million plots?

If you don't need printed output for certain sections, suppress it. For example, use PRINT=NONE on the ESTIMATE statement or use the ODS EXCLUDE statement to suppress output.

My advice is to (1) Use NOPRINT to turn off printing; (2) get rid of the plots, and (3) use OUTEST= and OUTFOR= options to send the results to data sets.

Is this real data or a simulation? If a simulation, see Tips #4 through #8 in the article "Eight tips to make your simulation run faster."

Khaladdin · Posted 11-01-2017 02:43 PM

Hi Rick,

Thanks for your suggestions. Actually, I do not need the plots. I did not write my full code when I asked the question. My full code is:

ods trace on;

ods select ParameterEstimates;

ods output ParameterEstimates=myEstimates;

proc ucm data=work;
	model price;
	by group;
	irregular plot=smooth;
	level checkbreak plot=smooth;
	estimate plot=residual;
	forecast plot=forecasts lead=10 alpha=0.5;
run;

proc print data=myEstimates;
run;

proc transpose data=myEstimates(keep=group component estimate)
               out=transposedEstimates;
  by group;
  id component;
run;

So, I have already transferred my results to datasets. But nothing changes. It will take a lot.

Rick_SAS · Posted 11-01-2017 02:55 PM

You might not realize that the procedure creates those millions of graphs but that ODS does not show them. Get rid of the graph requests. Also, it is much more efficient to use NOPRINT and OUTEST=myEstimates than to use the code you show.

Also, you don't need the PROC PRINT, which is probably trying to print a data set that has 10-15 million observations in it.

Khaladdin · Posted 11-01-2017 02:59 PM

Thanks again. So, the following code might be more efficient, yes?:

proc ucm data=work 
         outest=myEstimates
         noprint
         ;
    by group;
    model price;
    irregular;
    level checkbreak;
    estimate;
    forecast lead=10 alpha=0.5;
run;

Rick_SAS · Posted 11-01-2017 03:15 PM

You've got the right idea, but

1) The syntax is wrong, so check the doc. The OUTEST= option goes on the ESTIMATE statement, not on the PROC UCM statement.

2) If all you want are the parameter estimates, why are you doing all the other computations? For example, the forecast and confidence limits are expensive, so get rid of the FORECAST statement if you aren't saving the results. Only keep the statements that are relevant to the results that you intend to use.

As mentioned in the "8 Tips" article, run and debug your new code on a small subset of the data (maybe 5-10 BY groups) before you run it against the full data.

Khaladdin · Posted 11-01-2017 03:22 PM

proc ucm data=work 
         noprint
         ;
    by group;
    model price;
    irregular;
    level checkbreak;
    estimate  outest=myEstimates;
run;

What about this one?

rselukar · Posted 11-01-2017 03:33 PM

Rick is correct. One more thing, since you are using the checkbreak option in the LEVEL statement, I am assuming that you want to save the detected break points. Since the break points are produced in an ODS table only, NOPRINT may not be the way to go. Check all the tables produced by your UCM call and "ods exclude" them and "ods output" the outlier summary table. Something like this will work:

proc ucm data=work plots=none;

    ods exclude DataSet EstimationSpan ForecastSpan
      InitialParameters FitSummary ConvergenceStatus
      ParameterEstimates FitStatistics ComponentSignificance
      TrendInformation OutlierSummary;

    ods output OutlierSummary = osummary;
    by group;
    model price;
    irregular;
    level checkbreak;
    estimate outest=myEstimates;
run;

Rick_SAS · Posted 11-01-2017 03:41 PM

Or delete that statement if you do not need it. NOPRINT is faster than ODS EXCLUDE.

This new code should go faster. How much faster depends on your data. As I said, try it for 10 BY groups to make sure it works as you expect. Then time how long it takes to compute for 100 or 1000 BY groups. From that you can estimate how long it will take for a million BY groups.

You never answered my question about whether this is a simulation. If it is, you almost surely can get by with fewer than 1 million.simulated samples. I'd try 10,000 and see how large the Monte Carlo standard errors are.

Khaladdin · Posted 11-01-2017 03:45 PM

Sorry for not answering your question related to simulation. I missed it. It is a real data, not simulation.

Rick_SAS · Posted 11-01-2017 03:47 PM

Interesting. May I ask what the 1 million BY groups represent?

SAS_StateSpaceModel

Re: SAS_StateSpaceModel

Re: SAS_StateSpaceModel

StateSpaceModel

Re: StateSpaceModel

Re: StateSpaceModel

Re: SAS_StateSpaceModel

Re: SAS_StateSpaceModel

Re: SAS_StateSpaceModel

Re: SAS_StateSpaceModel

Re: SAS_StateSpaceModel

Re: SAS_StateSpaceModel

Re: SAS_StateSpaceModel

Re: SAS_StateSpaceModel

Re: SAS_StateSpaceModel

Re: SAS_StateSpaceModel

Registration is open