Hello Everyone,
I am trying to down sample a dataset from yearly frequency to monthly. I would really appreciate if some one could guide me to do so in a data step. I will explain what I am trying to do with an example:
Have
ID | Date | Rank |
0001 | 12/31/2012 | 2 |
0001 | 12/31/2013 | 2 |
0002 | 6/30/2006 | 4 |
0002 | 12/31/2007 | 5 |
Need
ID | Date | Rank |
0001 | 12/31/2012 | 2 |
0001 | 1/31/2013 | 2 |
0001 | 2/28/2013 | 2 |
0001 | 3/31/2013 | 2 |
0001 | 4/30/2013 | 2 |
0001 | 5/31/2013 | 2 |
0001 | 6/30/2013 | 2 |
0001 | 7/31/2013 | 2 |
0001 | 8/31/2013 | 2 |
0001 | 9/30/2013 | 2 |
0001 | 10/31/2013 | 2 |
0001 | 11/30/2013 | 2 |
0001 | 12/31/2013 | 3 |
0001 | 1/31/2014 | 3 |
0001 | 11/30/2014 | 3 |
Similarly for the other ID |
Thanks
data have;
infile cards expandtabs truncover;
input ID Date : mmddyy10. Rank;
format Date mmddyy10.;
cards;
0001 12/31/2012 2
0001 12/31/2013 2
0002 6/30/2006 4
0002 12/31/2007 5
;
run;
data want;
merge have have(keep=id date rename=(id=_id date=_date) firstobs=2);
output;
if id=_id then do;
temp=date;
do i=1 to intck('month',temp,_date)-1;
date=intnx('month',temp,i,'e');output;
end;
end;
drop i temp _:;
run;
Given this data:
0001 | 12/31/2012 | 2 |
0001 | 12/31/2013 | 2 |
what is the rule to create records to 11/30/2014?
How do we know that the rank becomes 3 in the output for 12/31/2013? And continues with teh same rank to 11/30/2014?
Hi Ballardw
That is a typo, the rank remains 2 all the way to 11/30/2014. The rank only changes if there is a change in the original data. I am reposting the Need table.
Need
ID | Date | Rank |
0001 | 12/31/2012 | 2 |
0001 | 1/31/2013 | 2 |
0001 | 2/28/2013 | 2 |
0001 | 3/31/2013 | 2 |
0001 | 4/30/2013 | 2 |
0001 | 5/31/2013 | 2 |
0001 | 6/30/2013 | 2 |
0001 | 7/31/2013 | 2 |
0001 | 8/31/2013 | 2 |
0001 | 9/30/2013 | 2 |
0001 | 10/31/2013 | 2 |
0001 | 11/30/2013 | 2 |
0001 | 12/31/2013 | 2 |
0001 | 1/31/2014 | 2 |
0001 | 11/30/2014 | 2 |
Similarly for the other ID |
Thanks
Still need the stopping rule.
Does Id 0002 continue on to 11/30/2008 or 11/30/2014 or something else?
Hi Ballardw,
This is a snippet of the dataset. For ID 0002, if that is the last observation in the dataset then it will stop on 11/30/2008 other wise it will stop one month prior to the date value for the next observation.
Thanks
I think this will get you started:
Note: if you have more than 2 values for ID there's going to be overlapping output.
/* assumes data is sorted by ID and date*/
data want (keep=id outdate outrank rename=(outdate=date outrank=rank)) ;
set example;
by id date;
retain FirstDate FirstRank;
if first.id then do;
FirstDate=Date;
FirstRank=rank;
end;
else do;
/* First to output values from the first date up to the second date*/
outdate=firstdate;
outrank=firstrank;
do until (outdate > intnx('month',date,-1,'e'));
output;
outdate = intnx('month',outdate,1,'e');
end;
outdate = date;
outrank = rank;
do until (outdate > intnx('month',date,11,'e'));
output;
outdate = intnx('month',outdate,1,'e');
end;
end;
format firstdate outdate mmddyy10.;
run;
data have;
infile cards expandtabs truncover;
input ID Date : mmddyy10. Rank;
format Date mmddyy10.;
cards;
0001 12/31/2012 2
0001 12/31/2013 2
0002 6/30/2006 4
0002 12/31/2007 5
;
run;
data want;
merge have have(keep=id date rename=(id=_id date=_date) firstobs=2);
output;
if id=_id then do;
temp=date;
do i=1 to intck('month',temp,_date)-1;
date=intnx('month',temp,i,'e');output;
end;
end;
drop i temp _:;
run;
Thank you Ksharp and ballardw for the solutions.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.