Hi;
I am working on the sample dataset below and using proc expand to get the missing dates.
data names;
input
name $10. day date7. number ;
format
name $10. day date7. ;
datalines;
John 12OCT15 3
John 14OCT15 5
John 16OCT15 8
Mike 11OCT15 1
Mike 13OCT15 7
Mike 18OCT15 9
;
proc expand data = names out=names_out from=day to=day method=none;
by name;
id day;
convert number=number / transformout=(setmiss 0.0);
run;
The output dataset that I get is just interpolating the missing dates and values for the past dates only. I also want to interpolate the dates to include the maximum date in the dataset. For example in this case since Mike has the most recent date as 18OCT15 I also want John to have 18OCT15 as the max date. The sample data would look like this below (So John would have two extra observations):
John 12OCT15 3
John 13OCT15 0
John 14OCT15 5
John 15OCT15 0
John 16OCT15 8
John 17OCT15 0
John 18OCT15 0
Mike 11OCT15 1
Mike 12OCT15 0
Mike 13OCT15 7
Mike 14OCT15 0
Mike 15OCT15 0
Mike 16OCT15 0
Mike 17OCT15 0
Mike 18OCT15 9
Is it possible to achieve this output using proc expand ? or I should use a different approac?
Thank you!
It's probably possible with PROC EXPAND if you first create dummy observations with (overall) minimum and maximum date for each name.
But it is no problem to get the result without PROC EXPAND (which I don't have licensed anyway). Here is a solution using PROC SQL:
proc sql noprint;
select min(day), max(day)
into :minday, :maxday
from names;
quit;
data alldates;
do day=&minday to &maxday;
output;
end;
run;
proc sql;
create table want as
select a.name format=$10., a.day format=date7., coalesce(b.number, 0) as number from
(select * from (select distinct name from names), alldates) a
left join names b
on a.name=b.name & a.day=b.day
order by name, day;
quit;
A pure data step solution would also be possible.
Please note that the above code combines all names with all dates in the range from minimum to maximum. This means that, in addition to the new records you mentioned, John gets a new observation for (Mike's) 11OCT15. If this is not acceptable, we can modify the solution.
It's probably possible with PROC EXPAND if you first create dummy observations with (overall) minimum and maximum date for each name.
But it is no problem to get the result without PROC EXPAND (which I don't have licensed anyway). Here is a solution using PROC SQL:
proc sql noprint;
select min(day), max(day)
into :minday, :maxday
from names;
quit;
data alldates;
do day=&minday to &maxday;
output;
end;
run;
proc sql;
create table want as
select a.name format=$10., a.day format=date7., coalesce(b.number, 0) as number from
(select * from (select distinct name from names), alldates) a
left join names b
on a.name=b.name & a.day=b.day
order by name, day;
quit;
A pure data step solution would also be possible.
Please note that the above code combines all names with all dates in the range from minimum to maximum. This means that, in addition to the new records you mentioned, John gets a new observation for (Mike's) 11OCT15. If this is not acceptable, we can modify the solution.
proc expand complains when you try to do that. A different approach is DIY :
/* Get the last date */
proc sql noprint;
select max(day) into :lastDay from names;
quit;
/* Insert missing dates with number=0 */
data names_out;
set names; by name;
previousDay = lag(day);
if not first.name then
do newDay = intnx("DAY", previousDay, 1) to intnx("DAY", day, -1);
newNumber = 0;
output;
end;
newDay = day;
newNumber = number;
output;
if last.name then
do newDay = intnx("DAY", day, 1) to intnx("DAY", &lastDay., 0);
newNumber = 0;
output;
end;
drop number day previousDay;
rename newNumber=number newDay=day;
format newDay date7.;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.