- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- I want to use Proc Expand to populate missing observations in a dataset indexed by time, that is, Monthly snapshot dates.
- The dataset has over 100 fields and I want to use proc expand to fill these gaps since I only have to transform fields that need to be changed and all other fields are treated as explained in step 3 below.
- The challenge is to get Proc Expand to use the field values for the previous record to fill the missing record. Proc Expand is using the subsequent record to populate the missing record.
Also attached are the following:
* Output showing my data before and after running proc expand on the data is the piece of code that
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Typically PROC TIMESERIES is used to fill gaps.
https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Reeza. Your reply gave me a clue.
The answer is to use proc expand with the option method=step to get the previously missing observation to use field values for the last non-missing observation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Bravo @Bobbe_Stiff! Please post your solution code (using the {i} button) to help future users facing the same problem. You can mark your own solution as the correct answer too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is the code to use PROC TIMEDATA to fill the time ID gaps using the previous values.
data test;
length claim $3 claims_lodgement_date date_of_injury snapshot_date dev_month_lodge dev_month_injury netcostc netpayc 8;
format claims_lodgement_date date_of_injury snapshot_date date9. netcostc netpayc dollar10. ;
format dev_month_lodge dev_month_injury 3.;
informat claims_lodgement_date date_of_injury snapshot_date date9.;
infile datalines delimiter=',' dsd;
input claim claims_lodgement_date date_of_injury snapshot_date dev_month_lodge dev_month_injury netcostc netpayc ;
put _all_;
datalines;
XYZ,01FEB1994,21JUL1993,28FEB1994,0,7,80,80
XYZ,01FEB1994,21JUL1993,30APR1994,2,9,3961,3961
;
run;
proc timedata data=test out=step11;
by claim;
var claims_lodgement_date date_of_injury Dev_Month_Lodge Dev_Month_Injury netcostc netpayc;
id snapshot_date interval = month setmissing = prev;
run;