BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
krm
Obsidian | Level 7 krm
Obsidian | Level 7

Hi;

 

I am working on the sample dataset below and using proc expand to get the missing dates.

 

data names;
input
name $10. day date7. number ;
format
name $10. day date7. ;
datalines;
John 12OCT15 3
John 14OCT15 5
John 16OCT15 8
Mike 11OCT15 1
Mike 13OCT15 7
Mike 18OCT15 9
;

 

proc expand data = names out=names_out from=day to=day method=none;
by name;
id day;
convert number=number / transformout=(setmiss 0.0);
run;

 

The output dataset that I get is just interpolating the missing dates and values for the past dates only. I also want to interpolate the dates to include the maximum date in the dataset. For example in this case since Mike has the most recent date as 18OCT15 I also want John to have 18OCT15 as the max date. The sample data would look like this below (So John would have two extra observations):

 

John 12OCT15 3
John 13OCT15 0
John 14OCT15 5
John 15OCT15 0
John 16OCT15 8

John 17OCT15 0

John 18OCT15 0
Mike 11OCT15 1
Mike 12OCT15 0
Mike 13OCT15 7
Mike 14OCT15 0
Mike 15OCT15 0
Mike 16OCT15 0
Mike 17OCT15 0
Mike 18OCT15 9

 

 Is it possible to achieve this output using proc expand ? or I should use a different approac?

 

Thank you!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

It's probably possible with PROC EXPAND if you first create dummy observations with (overall) minimum and maximum date for each name.

 

But it is no problem to get the result without PROC EXPAND (which I don't have licensed anyway). Here is a solution using PROC SQL:

 

proc sql noprint;
select min(day), max(day)
into :minday, :maxday
from names;
quit;

data alldates;
do day=&minday to &maxday;
  output;
end;
run;

proc sql;
create table want as
select a.name format=$10., a.day format=date7., coalesce(b.number, 0) as number from 
(select * from (select distinct name from names), alldates) a
left join names b
on a.name=b.name & a.day=b.day
order by name, day;
quit;

A pure data step solution would also be possible.

 

Please note that the above code combines all names with all dates in the range from minimum to maximum. This means that, in addition to the new records you mentioned, John gets a new observation for (Mike's) 11OCT15. If this is not acceptable, we can modify the solution.

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

It's probably possible with PROC EXPAND if you first create dummy observations with (overall) minimum and maximum date for each name.

 

But it is no problem to get the result without PROC EXPAND (which I don't have licensed anyway). Here is a solution using PROC SQL:

 

proc sql noprint;
select min(day), max(day)
into :minday, :maxday
from names;
quit;

data alldates;
do day=&minday to &maxday;
  output;
end;
run;

proc sql;
create table want as
select a.name format=$10., a.day format=date7., coalesce(b.number, 0) as number from 
(select * from (select distinct name from names), alldates) a
left join names b
on a.name=b.name & a.day=b.day
order by name, day;
quit;

A pure data step solution would also be possible.

 

Please note that the above code combines all names with all dates in the range from minimum to maximum. This means that, in addition to the new records you mentioned, John gets a new observation for (Mike's) 11OCT15. If this is not acceptable, we can modify the solution.

PGStats
Opal | Level 21

proc expand complains when you try to do that. A different approach is DIY :

 


/* Get the last date */
proc sql noprint;
select max(day) into :lastDay from names;
quit;

/* Insert missing dates with number=0 */
data names_out;
set names; by name;
previousDay = lag(day);
if not first.name then 
    do newDay = intnx("DAY", previousDay, 1) to intnx("DAY", day, -1);
        newNumber = 0;
        output;
        end;
newDay = day;
newNumber = number;
output;
if last.name then
    do newDay = intnx("DAY", day, 1) to intnx("DAY", &lastDay., 0);
        newNumber = 0;
        output;
        end; 
drop number day previousDay;
rename newNumber=number newDay=day;
format newDay date7.;
run;
PG
krm
Obsidian | Level 7 krm
Obsidian | Level 7
Thank you Freelancereinhard and PGStats. Both solutions were very helpful and I used them both. It was good to see different approaches.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1704 views
  • 2 likes
  • 3 in conversation