- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Good afternoon.
I have a dataset of records based on KEY_ID. The column KEY_DATE is a unique date per KEY_ID. The DATE_1 and DATE_2 fields bracket the KEY_DATE in each instance and can extend past the KEY_DATE. I need to determine the number of days that fall within each month.
Can anyone figure this out for me or point me in some direction? I don't have any example code because I have not been at all successful in even getting close.
I've tried and I can't seem to figure it out. I'd also be open to a different approach if anyone has a suggestion.
KEY_ID | KEY_DATE | DATE_1 | DATE_2 | WANT_JANUARY | WANT_FEBRURARY | WANT_MARCH | WANT_APRIL |
3 | 1/1/2024 | 12/16/2023 | 4/20/2024 | 31 | 28 | 31 | 20 |
4 | 1/1/2024 | 12/31/2023 | 2/16/2024 | 22 | 16 | 0 | 0 |
5 | 2/1/2024 | 1/30/2024 | 4/15/2024 | 1 | 28 | 31 | 15 |
6 | 1/1/2024 | 12/2/2023 | 2/12/2024 | 31 | 12 | 0 | 0 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Easier if you make a TALL dataset instead.
First let's convert your listing into an actual dataset. (and display the date with an unambiguous style)
data have;
input KEY_ID (KEY_DATE DATE_1 DATE_2) (:mmddyy.);
format KEY_DATE DATE_1 DATE_2 yymmdd10.;
cards;
3 1/1/2024 12/16/2023 4/20/2024
4 1/1/2024 12/31/2023 2/16/2024
5 2/1/2024 1/30/2024 4/15/2024
6 1/1/2024 12/02/2023 2/12/2024
;
Now just loop over the months between DATE_1 and DATE_2.
data tall ;
set have;
do offset=0 to intck('month',date_1,date_2);
month = intnx('month',date_1,offset);
days = min(date_2,intnx('month',month,0,'e')) - max(month,date_1) + 1;
output;
end;
format month yymm7.;
run;
You might a REPORT that looks a lot like your request using the new MONTH variable as an ACROSS variable.
proc report ;
columns key_id key_date date_1 date_2 days,month ;
define key_id / group;
define key_date / group;
define date_1 / group;
define date_2 / group;
define month / across ' ';
run;
Result
Or you could use PROC TRANPOSE to make a dataset in that (hard to work with) wide format.
proc transpose data=tall out=wide(drop=_name_) prefix=Y;
by key_id key_date date_1 date_2;
id month;
var days;
run;
Result
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi:
I'm just trying to understand the logic of how Key_ID 3 has a value of 31 for Want_January when Date_1 is in Dec 2023 and how Key_ID 4 has a value of 22, again, when the Date_1 value is in Dec 2023. Your logic is a bit murky. Can you clarify?
Cynthia
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Jeff, I did not get what you said. I have several questions on the rules you calculate the days for those months: (1) could you please tell how did you come up with the number of 22 days for that January? You can use arithmetic formula to show it, such as 31-9=22, or tell the start day and end day in your calculation, such as start day is 01Jan2024 and the end day is 22Jan2024. (2)why there is only 28 days in that February? Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are welcome, Jeff! In fact I personally think Tom's solution is brilliant, particularly because of the statement: days=min(date2, intnx('month', month,0,'e'))-max(month, date_1)+1, and the proc sort and proc report steps which helps me review what I have learnt. I like quickbluefish's solution as well, which is a bit too advanced for me. I appreciate this forum, I can learn a lot here and see how others think and write SAS code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Agree with @Cynthia_sas - several confusing things here, esp. the 22 days for the 3rd row in the Jan column. Also, 2024 is a leap year, so there are 29 days in Feb. You could do this:
data test;
infile cards dsd truncover firstobs=1 dlm='09'x;
length key key_date date_1 date_2 4;
informat key_date date_: mmddyy10.;
input key key_date date_1 date_2;
format key_date date_: date9.;
cards;
3 1/1/2024 12/16/2023 4/20/2024
4 1/1/2024 12/31/2023 2/16/2024
5 2/1/2024 1/30/2024 4/15/2024
6 1/1/2024 12/2/2023 2/12/2024
;
run;
data want;
set test;
array w {1:4} want_01 want_02 want_03 want_04;
mindt=mdy(scan(vname(w[lbound(w)]),2,'_')*1,1,year(key_date));
do m=lbound(w) to hbound(w);
sdt=mdy(m,1,year(mindt));
w[m]=max(0, min(date_2,intnx('month',sdt,0,'end'))-max(mindt,sdt,date_1)+1);
end;
drop m sdt mindt;
run;
proc print data=want; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Totally understand the confusion. I'm not the best at explaining it. Sorry. I really appreciate your time and expertise.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Easier if you make a TALL dataset instead.
First let's convert your listing into an actual dataset. (and display the date with an unambiguous style)
data have;
input KEY_ID (KEY_DATE DATE_1 DATE_2) (:mmddyy.);
format KEY_DATE DATE_1 DATE_2 yymmdd10.;
cards;
3 1/1/2024 12/16/2023 4/20/2024
4 1/1/2024 12/31/2023 2/16/2024
5 2/1/2024 1/30/2024 4/15/2024
6 1/1/2024 12/02/2023 2/12/2024
;
Now just loop over the months between DATE_1 and DATE_2.
data tall ;
set have;
do offset=0 to intck('month',date_1,date_2);
month = intnx('month',date_1,offset);
days = min(date_2,intnx('month',month,0,'e')) - max(month,date_1) + 1;
output;
end;
format month yymm7.;
run;
You might a REPORT that looks a lot like your request using the new MONTH variable as an ACROSS variable.
proc report ;
columns key_id key_date date_1 date_2 days,month ;
define key_id / group;
define key_date / group;
define date_1 / group;
define date_2 / group;
define month / across ' ';
run;
Result
Or you could use PROC TRANPOSE to make a dataset in that (hard to work with) wide format.
proc transpose data=tall out=wide(drop=_name_) prefix=Y;
by key_id key_date date_1 date_2;
id month;
var days;
run;
Result
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Base on your answer, my version (for creating dataset with days) is like this (the proc sort and proc report steps and the results are the same). Although I used more steps and statements, I think the logic is easier to understand for new users, especially who is not good at math, do not remember many functions, and not get used to more advanced techniques such as array and nested loops.
data test;
input id iddate date1 date2;
informat iddate date: mmddyy.;
format iddate date: date9.;
datalines;
3 1/1/2024 12/16/2023 4/20/2024
4 1/1/2024 12/31/2023 2/16/2024
5 2/1/2024 1/30/2024 4/15/2024
6 1/1/2024 12/2/2023 2/12/2024
;
run;
data want;
set test;
do i=0 to intck('month',date1,date2);
month=intnx('month',date1,i);
days=min(date2,
intnx('month',month,0,'e')
)
-max(month,date1)
+1;
output;
end;
format month yymm7.;
run;
proc print data=want;run;
/***my version***/
/*calculate months*/
data mthcalc;
set test;
do i=0 to intck('month',date1,date2);
month=intnx('month',date1,i);
output;
end;
format month yymm7.;
run;
/*calculate days*/
data daycalc;
set mthcalc;
/*determine start and end day of the 1st month*/
if i=0 then do;
startday=date1;
endday=intnx('month',date1,0,'end');
end;
/*determine start and end day of the last month*/
else if i=intck('month',date1,date2)
then do;
startday=intnx('month',date2,0);
endday=date2;
end;
/*determine start and end day of other months*/
else do;
startday=mdy(month(month),01,year(month));
endday=intnx('month',startday,0,'end');
end;
/*calcualte days of each month*/
days=endday-startday+1;
format startday endday date9.;
run;
proc print data=daycalc;run;
/***transpose and report steps***/
proc transpose data=daycalc
out=finalrpt (drop=_name_);
by id iddate date1 date2;
id month;
var days;
run;
proc print data=finalrpt noobs;run;
proc report data=daycalc;
columns id iddate date1 date2 days,month;
define id/group;
define iddate/group;
define date1/group;
define date2/group;
define month/across ' ';
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@dxiao2017 wrote:
Base on your answer, my version (for creating dataset with days) is like this (the proc sort and proc report steps and the results are the same). Although I used more steps and statements, I think the logic is easier to understand for new users, especially who is not good at math, do not remember many functions, and not get used to more advanced techniques such as array and nested loops.
I disagree that the logic you used is easier to understand than what @Tom showed. In my work, I see people all the time trying to do something the hard way, they put a huge amount of effort into it (like I assume you did) and they write a lot of code, when this isn't necessary, and there are much easier ways to get the desired output.
And so in this case, @Tom has shown how to get the output with a lot less code. He even gave good advice to people who have this or similar problems, he said "Easier if you make a TALL dataset instead." Also, Tom has not used arrays or nested loops, and he has shown very useful methods (like use TALL data sets and PROC REPORT) that SAS programmers ought to learn early in their career. SAS has done the hard work for you, so you don't have to work hard.
By the way, @dxiao2017 , maybe you were replying to the code from @quickbluefish, but your actual post says you are replying to @Tom .
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paige, thanks a lot for your feedback and valuable comments!
The reasons I said my logic is easier to understand is as follows: 1) @Tom use one single statement to calculate the days for each month, like this:
days=min(date2, intnx('month',month,0,'e') )-max(month,date1)+1;
I use two data steps (one for months and the other one for days, in my last thread) with many more statements to calculate the days for each month, like this:
/*determine start and end day of the 1st month*/
if i=0 then do;
startday=date1;
endday=intnx('month',date1,0,'end');
end;
/*determine start and end day of the last month*/
else if i=intck('month',date1,date2)
then do;
startday=intnx('month',date2,0);
endday=date2;
end;
/*determine start and end day of other months*/
else do;
startday=mdy(month(month),01,year(month));
endday=intnx('month',startday,0,'end');
end;
/*calcualte days of each month*/
days=endday-startday+1;
The logic of my version is: explicitly show every single step of thinking with one SAS statement or step. And my steps of thinking are shown in the notes in the code above. 2) For me, who is not good at math, writing code this way saves a lot of more complex thinking (and therefore consumes more coding steps), i.e., when you think a basic and single step you explicitly write it out. @Tom combined several thinking steps using one statement which integrates and presents different calculations for the days of the 1st month, the last month, and other months. Maybe what @Tom did is the common way of thinking and common practice for people who have science and math background, but for people who is not that good at math, perhaps my steps is easier to understand and follow. Also, if this code needs to be developed to macros, I think my steps is necessary. Nevertheless, I will learn to combine detailed and basic steps and statements using more advanced and comprehensive thinking and coding techniques later on.
I agree with you that @Tom created a tall dataset at first, and then base on which calculated the days, i.e., creating a tall dataset makes things easier, comparing to @quickbluefish 's solution which at first uses an array and then a loop with advanced functions. So I followed @Tom 's steps to write my version. I have just started to learn array and nested loops and advanced functions and did not follow @quickbluefish 's solution. I will learn those techniques later on. The other reason I followed @Tom 's solution is that the proc sort and proc report steps help me review what I learnt, it is a good example and good practice for me.
By talking about array and (nested) loops I was indeed refer to @quickbluefish 's solution. You are right about this. Thanks again @PaigeMiller for your valuable comments, which offers me ideas on what aspects I need to improve and need to think differently.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I forgot to say that in a lot of scenarios, when you need to create dataset as your final product, my way of creating that tall dataset with the startday and endday columns, perhaps has more advantages, in that this dataset has better traceability, is easier to compare with the raw data, and to validate.