Hi All,
I'm working with some price data that is summed up by the year and the fiscal quarter. However, not all observations have purchases in all quarters/years. I want to make those missing quarters/year appear as a 0. I have a 3 year bubble around each of my observations where I have data on what they spent.
Currently this is how my data looks
ID Year Qtr Spent
1 2010 1 50
1 2010 2 100
1 2010 4 25
1 2011 2 100
1 2011 4 200
What I want is:
ID Year Qtr Spent
1 2010 1 50
1 2010 2 100
1 2010 3 0
1 2010 4 25
1 2011 1 0
1 2011 2 100
1 2011 3 0
1 2011 4 200
1 2012 1 0
1 2012 2 0
1 2012 3 0
1 2012 4 0
Thanks in advance!
If you have to do it yourself without ETS, it's not that lengthy. First, create a data set with all combinations of ID, YEAR, and QTR:
proc sql noprint;
create table year_list as select distinct year from have;
create table id_list as select distinct id from have;
create table qtr_list as select distinct qtr from have;
create table all_combinations as select * from year_list, id_list, qtr_list;
quit;
Then sort, merge, fill in the zeros:
proc sort data=all_combinations;
by id year qtr;
run;
proc sort data=have;
by id year qtr;
run;
data want;
merge all_combinations have (in=have_data);
by id year qtr;
if have_data=0 then spent=0;
run;
If you have SAS/ETS look at PROC TIMESERIES
I believe that I have proc timeseries, but have never used it. How would you use it to create what I need?
You specify the time interval, the start and end period and it fills it in.
Make sure to include your BY statement as well.
Here's a fully worked example with fake data:
/*this is an example of time series data. 
1. Create a time series data set with missing intervals (IBM)
2. Add back missing entries using PROC TIMESERIES (IBM_NO_MISSING)
3. Calculate moving average - 12 month average
*/
/*1*/
data ibm;
	set sashelp.stocks;
	where stock='IBM';
	if month(date)=7 then
		delete;
run;
proc sort data=ibm;
	by date;
run;
/*2*/
proc timeseries data=ibm out=ibm_no_missing;
	id date interval=month start='01Aug1986'd end='01Dec2005'd;
	var open;
run;
For the example you posted where it has
start='01Aug1986'd end='01Dec2005'd;
Is this the overall global start and end date?
The way I get the data is that I have a 3 year bubble around each ID as they enter into my data. However, not all of them are entering in at the same time, some might enter in 2010 and others in 2013.
Ah...then this method won't work, you'll need to use the data step approach. If you have a data set that defines the bubbles it's relatively straightforward to loop the dates to get an empty table with all the records and then merge it in with the original table to get the results.
OK, how might you go about that then? I was thinking about something with loops but could not think of a solution.
What data do you have? If all you have is what you've shown above, how do you know what the start and end of the loop will be?
I have the variables I showed, ID, Yr, Qtr and Spent plus the start date of when they entered so:
ID Year Qtr Spent Date of Entry
1 2010 1 50 Jan 2011
1 2010 2 100 Jan 2011
1 2010 4 25 Jan 2011
1 2011 2 100 Jan 2011
1 2011 4 200 Jan 2011
So I have a year of lookback, the year during and the year after the entry date.
If you have to do it yourself without ETS, it's not that lengthy. First, create a data set with all combinations of ID, YEAR, and QTR:
proc sql noprint;
create table year_list as select distinct year from have;
create table id_list as select distinct id from have;
create table qtr_list as select distinct qtr from have;
create table all_combinations as select * from year_list, id_list, qtr_list;
quit;
Then sort, merge, fill in the zeros:
proc sort data=all_combinations;
by id year qtr;
run;
proc sort data=have;
by id year qtr;
run;
data want;
merge all_combinations have (in=have_data);
by id year qtr;
if have_data=0 then spent=0;
run;
So this will work but it does still have some issues. My data goes from 2009 to 2015. but not all IDs are coming in during the same time frame. Some are coming in 2010 and others in 2013 and I just want to be able to capture the 3 year bubble around them. I tried this solution but it gives me the full 2009 to 2015 time frame around all of the IDs. Any thoughts on how to get it to just be around the 3 year bubble?
@endofline wrote:
Any thoughts on how to get it to just be around the 3 year bubble?
What's a 3 year bubble? As long as you can define it, you're fine, so far you haven't offered a definition.
ID Year Qtr Spent Date of Entry
1 2010 1 50 Jan 2011
1 2010 2 100 Jan 2011
1 2010 4 25 Jan 2011
1 2011 2 100 Jan 2011
1 2011 4 200 Jan 2011
So for the 3 year bubble I have a year of lookback, the year during and the year after the entry date.And I have this for each observation, with each obersvation having a different date of entry.
You'll have to describe the date ranges a little more. (And of course the program gets more complex.)
Could a company begin in Q3 of one year, and end in Q2 of another year? How do you know when the window begins and ends for each company? Does a window have to be exactly 12 quarters, or does it depend on the data?
I'll be able to compare my questions to your (just-posted) description later tonight.
I know when they enter in the data set for sure by the date entered variable and that we're for sure suppose to be getting 1 year of lookback data from that entry date, the year during and the year after.
It does happen that someone does enter in the dataset in say April or October. However, it should be 12 qtrs.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
