03-28-2017 03:40 AM
Well, I have seasonal data and I want to mimic weekly data through moving bootstrapping. But I can't find the most suitable sample and I just piece them together. Is the concept that the mean and var come from AR right? And how to use it in the bootstrapping?
And I want to get the distribution of them.
proc import datafile='C:\Users\user\Desktop\\morgan.csv' out= morgan dbms=dlm; delimiter=','; format Date yymm.; getnames=yes; run; data morgan; set a; PXlag = lag1(PX ); run; proc autoreg data=b; model PX = PXlag / lagdep=PXlag; output out=resid mean standards; r=resid; run; proc surveyselect data=b out=outboot seed=30459584 method=urs samprate=1 outhits rep=1000; run; proc univariate data=outboot; var x; by Replicate; output out=outall kurtosis=curt; run; proc univariate data=outall; var curt; output out=final pctlpts=2.5, 97.5 pctlpre=ci; run;
03-28-2017 05:50 AM
1. Do you have a reference for what you are trying to achieve?
2. Do you intend to use PROC IML to implement the bootstrap?
Most examples use a bootstrap to resample IID data. You need to be careful in time series data to preserve the time-element of the data. I suggest you do an internet search and read about
> bootstrap "time series"
03-28-2017 10:18 PM
03-28-2017 10:47 PM
%macro def_spread_bootstrap(input_data =, boot_iter = 250); /* Compute the number of observations in the input dataset: */ %local nobs_input; proc sql noprint; select count(*) into :nobs_input from &input_data ; quit; /* First, fit the models for the original data to obtain residuals */ /* Excess return is regressed on the 1st lag of default spread. */ proc autoreg data = &input_data(drop = div_yield vix) noprint; model ret_excess = def_spread_lag1 / method = ml; output out = areg_out1 r = full_resid1 p = full_pred1; run; quit; /* Fitting AR(2) model for the default spread. */ proc autoreg data = &input_data(drop = div_yield vix) noprint; model def_spread = / nlag = 2 method = ml; output out = areg_out2 r = full_resid2 p = full_pred2; run; quit; /* Combine output from the two regressions */ data combined_output; set areg_out1; set areg_out2 (keep = full_pred2 full_resid2); run; /* Predict the excess return out-of-sample. The first observation is going to be the predicted value from the original model, and the rest will be filled via bootstrap. */ data bs_result; set combined_output (keep = full_pred1 firstobs = &nobs_input rename = (full_pred1 = pred_bs)); run; /* Extract bivariate residuals: */ data bivar_resid; set combined_output (keep = full_resid1 full_resid2 rename = (full_resid1 = full_resid1_bs full_resid2 = full_resid2_bs)); run; /* The following loop produces &boot_iter boostraped datasets. For each dataset, the predicted return is calculated and added to bs_result. */ %do i = 1 %to &boot_iter; /* To perform sampling w/o replacement, shuffle the bivariate residuals: */ data bivar_resid; set bivar_resid; rnd = ranuni(0); if _n_ eq 1 then rnd = 0; if _n_ eq &nobs_input then rnd = 1; run; proc sort data = bivar_resid; by rnd; run; data input_bs; set combined_output; /* Add the shuffled residuals to the combined_output: */ set bivar_resid (drop = rnd); /* Create boostrapped time series by adding the shuffled residuals to the fitted values: */ ret_excess_bs = full_pred1 + full_resid1_bs; def_spread_bs = full_pred2 + full_resid2_bs; if _n_ eq 1 then ret_excess_bs = ret_excess; if _n_ eq 1 then def_spread_bs = def_spread; if _n_ eq &nobs_input then def_spread_bs = .; def_spread_bs_lag1 = lag(def_spread_bs); keep date ret_excess_bs def_spread_bs def_spread_bs_lag1; run; /* Run the model for the excess return based on the bootstrapped series: */ proc autoreg data = input_bs noprint; model ret_excess_bs = def_spread_bs_lag1 / method = ml; output out = out_bs p = pred_bs; run; quit; /* Extract the predicted return and append it to bs_result: */ proc append base = bs_result data = out_bs(keep = pred_bs firstobs = &nobs_input); run; %end; /* do cycle */ /* Plot the predicted returns: */ proc univariate data = bs_result noprint; title 'Histogram of predicted excess returns'; histogram pred_bs / cfill = pink; run; /* Compute the boostrap t-statistic: */ proc ttest data = bs_result; title 'Bootstrap t-value'; run; /* Delete all datasets that were created within the macro: */ proc sql; drop table areg_out1, areg_out2, bivar_resid, bs_result, combined_output, input_bs, out_bs ; quit; %mend def_spread_bootstrap;
Multivariate time series bootstrap code is the closet one that I can find. But I don't know how to modify it. That's the reason why I post the original code and want to modify it.
03-29-2017 06:04 AM
Example 9.12: Simulations of a Univariate ARMA Process
General Statistics Examples
of IML documentation.
And there are a bunch of function you can use to simulation time series data in IML.
Time Series Analysis and Examples
and maybe you could find one .
04-03-2017 11:16 PM
Sorry, after discussion with others, I believe maybe I mistake the soluthion. In the beginging, I want to use the bootstrapping to slove the insufficient of my data. So I want to get the distribution of the data and then simulate to impute the data. Now I believe that I can dicrectly use the imputation to handle the problem. I'm sorry for your confusion and thank you for your help.