Hello Everyone,
I am trying to down sample a dataset from yearly frequency to monthly. I would really appreciate if some one could guide me to do so in a data step. I will explain what I am trying to do with an example:
Have
ID | Date | Rank |
0001 | 12/31/2012 | 2 |
0001 | 12/31/2013 | 2 |
0002 | 6/30/2006 | 4 |
0002 | 12/31/2007 | 5 |
Need
ID | Date | Rank |
0001 | 12/31/2012 | 2 |
0001 | 1/31/2013 | 2 |
0001 | 2/28/2013 | 2 |
0001 | 3/31/2013 | 2 |
0001 | 4/30/2013 | 2 |
0001 | 5/31/2013 | 2 |
0001 | 6/30/2013 | 2 |
0001 | 7/31/2013 | 2 |
0001 | 8/31/2013 | 2 |
0001 | 9/30/2013 | 2 |
0001 | 10/31/2013 | 2 |
0001 | 11/30/2013 | 2 |
0001 | 12/31/2013 | 3 |
0001 | 1/31/2014 | 3 |
0001 | 11/30/2014 | 3 |
Similarly for the other ID |
Thanks
data have;
infile cards expandtabs truncover;
input ID Date : mmddyy10. Rank;
format Date mmddyy10.;
cards;
0001 12/31/2012 2
0001 12/31/2013 2
0002 6/30/2006 4
0002 12/31/2007 5
;
run;
data want;
merge have have(keep=id date rename=(id=_id date=_date) firstobs=2);
output;
if id=_id then do;
temp=date;
do i=1 to intck('month',temp,_date)-1;
date=intnx('month',temp,i,'e');output;
end;
end;
drop i temp _:;
run;
Given this data:
0001 | 12/31/2012 | 2 |
0001 | 12/31/2013 | 2 |
what is the rule to create records to 11/30/2014?
How do we know that the rank becomes 3 in the output for 12/31/2013? And continues with teh same rank to 11/30/2014?
Hi Ballardw
That is a typo, the rank remains 2 all the way to 11/30/2014. The rank only changes if there is a change in the original data. I am reposting the Need table.
Need
ID | Date | Rank |
0001 | 12/31/2012 | 2 |
0001 | 1/31/2013 | 2 |
0001 | 2/28/2013 | 2 |
0001 | 3/31/2013 | 2 |
0001 | 4/30/2013 | 2 |
0001 | 5/31/2013 | 2 |
0001 | 6/30/2013 | 2 |
0001 | 7/31/2013 | 2 |
0001 | 8/31/2013 | 2 |
0001 | 9/30/2013 | 2 |
0001 | 10/31/2013 | 2 |
0001 | 11/30/2013 | 2 |
0001 | 12/31/2013 | 2 |
0001 | 1/31/2014 | 2 |
0001 | 11/30/2014 | 2 |
Similarly for the other ID |
Thanks
Still need the stopping rule.
Does Id 0002 continue on to 11/30/2008 or 11/30/2014 or something else?
Hi Ballardw,
This is a snippet of the dataset. For ID 0002, if that is the last observation in the dataset then it will stop on 11/30/2008 other wise it will stop one month prior to the date value for the next observation.
Thanks
I think this will get you started:
Note: if you have more than 2 values for ID there's going to be overlapping output.
/* assumes data is sorted by ID and date*/
data want (keep=id outdate outrank rename=(outdate=date outrank=rank)) ;
set example;
by id date;
retain FirstDate FirstRank;
if first.id then do;
FirstDate=Date;
FirstRank=rank;
end;
else do;
/* First to output values from the first date up to the second date*/
outdate=firstdate;
outrank=firstrank;
do until (outdate > intnx('month',date,-1,'e'));
output;
outdate = intnx('month',outdate,1,'e');
end;
outdate = date;
outrank = rank;
do until (outdate > intnx('month',date,11,'e'));
output;
outdate = intnx('month',outdate,1,'e');
end;
end;
format firstdate outdate mmddyy10.;
run;
data have;
infile cards expandtabs truncover;
input ID Date : mmddyy10. Rank;
format Date mmddyy10.;
cards;
0001 12/31/2012 2
0001 12/31/2013 2
0002 6/30/2006 4
0002 12/31/2007 5
;
run;
data want;
merge have have(keep=id date rename=(id=_id date=_date) firstobs=2);
output;
if id=_id then do;
temp=date;
do i=1 to intck('month',temp,_date)-1;
date=intnx('month',temp,i,'e');output;
end;
end;
drop i temp _:;
run;
Thank you Ksharp and ballardw for the solutions.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.