BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Jeff_DOC
Pyrite | Level 9

Good afternoon.

 

I have a dataset of records based on KEY_ID. The column KEY_DATE is a unique date per KEY_ID. The DATE_1 and DATE_2 fields bracket the KEY_DATE in each instance and can extend past the KEY_DATE. I need to determine the number of days that fall within each month.

 

Can anyone figure this out for me or point me in some direction? I don't have any example code because I have not been at all successful in even getting close.

 

I've tried and I can't seem to figure it out. I'd also be open to a different approach if anyone has a suggestion. 

 

KEY_ID KEY_DATE DATE_1 DATE_2 WANT_JANUARY WANT_FEBRURARY WANT_MARCH WANT_APRIL
3 1/1/2024 12/16/2023 4/20/2024 31 28 31 20
4 1/1/2024 12/31/2023 2/16/2024 22 16 0 0
5 2/1/2024 1/30/2024 4/15/2024 1 28 31 15
6 1/1/2024 12/2/2023 2/12/2024 31 12 0 0
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Easier if you make a TALL dataset instead.

First let's convert your listing into an actual dataset. (and display the date with an unambiguous style)

data have;
  input KEY_ID (KEY_DATE DATE_1 DATE_2) (:mmddyy.);
  format KEY_DATE DATE_1 DATE_2 yymmdd10.;
cards;
3 1/1/2024 12/16/2023  4/20/2024
4 1/1/2024 12/31/2023  2/16/2024
5 2/1/2024  1/30/2024  4/15/2024
6 1/1/2024 12/02/2023  2/12/2024
;

Now just loop over the months between DATE_1 and DATE_2.

data tall ;
  set have;
  do offset=0 to intck('month',date_1,date_2);
    month = intnx('month',date_1,offset);
    days = min(date_2,intnx('month',month,0,'e')) - max(month,date_1) + 1;
    output;
  end;
  format month yymm7.;
run;

You might a REPORT that looks a lot like your request using the new MONTH variable as an ACROSS variable.

proc report ;
  columns key_id key_date date_1 date_2 days,month ;
  define key_id / group;
  define key_date / group;
  define date_1 / group;
  define date_2 / group;
  define month / across ' ';
run;

Result

Tom_0-1744420844480.png

Or you could use PROC TRANPOSE to make a dataset in that (hard to work with) wide format.

proc transpose data=tall out=wide(drop=_name_) prefix=Y;
  by key_id key_date date_1 date_2;
  id month;
  var days;
run;

Result

Tom_1-1744420979648.png

 

 

View solution in original post

17 REPLIES 17
Cynthia_sas
Diamond | Level 26

Hi:

  I'm just trying to understand the logic of how Key_ID 3 has a value of 31 for Want_January when Date_1 is in Dec 2023 and how Key_ID 4 has a value of 22, again, when the Date_1 value is in Dec 2023. Your logic is a bit murky. Can you clarify?

Cynthia

Jeff_DOC
Pyrite | Level 9
Sorry, it was confusing. This is because the number of days I need were only for January. Since the number of days between DATE_1 and DATE_2 encompass all of January there were 31 days in that bucket. All days in February were encompassed as well (28) days, all days in March (31 days) but only 20 days in April since DATE_2 ends on April 20th.
dxiao2017
Lapis Lazuli | Level 10

Hi, Jeff, I did not get what you said. I have several questions on the rules you calculate the days for those months: (1) could you please tell how did you come up with the number of 22 days for that January? You can use arithmetic formula to show it, such as 31-9=22, or tell the start day and end day in your calculation, such as start day is 01Jan2024 and the end day is 22Jan2024. (2)why there is only 28 days in that February? Thanks.

dxiao2017
Lapis Lazuli | Level 10
Sorry for my English, by "arithmetic formula" I mean math expressions.
Jeff_DOC
Pyrite | Level 9
Thank you so much for taking the time to reply to me.
dxiao2017
Lapis Lazuli | Level 10

You are welcome, Jeff! In fact I personally think Tom's solution is brilliant, particularly because of the statement: days=min(date2, intnx('month', month,0,'e'))-max(month, date_1)+1, and the proc sort and proc report steps which helps me review what I have learnt. I like quickbluefish's solution as well, which is a bit too advanced for me. I appreciate this forum, I can learn a lot here and see how others think and write SAS code.

 

 

 

quickbluefish
Barite | Level 11

Agree with @Cynthia_sas  - several confusing things here, esp. the 22 days for the 3rd row in the Jan column.  Also, 2024 is a leap year, so there are 29 days in Feb.  You could do this:

data test;
infile cards dsd truncover firstobs=1 dlm='09'x;
length key key_date date_1 date_2 4;
informat key_date date_: mmddyy10.;
input key key_date date_1 date_2;
format key_date date_: date9.;
cards;
3	1/1/2024	12/16/2023	4/20/2024
4	1/1/2024	12/31/2023	2/16/2024	
5	2/1/2024	1/30/2024	4/15/2024
6	1/1/2024	12/2/2023	2/12/2024
;
run;

data want;
set test;
array w {1:4} want_01 want_02 want_03 want_04;
mindt=mdy(scan(vname(w[lbound(w)]),2,'_')*1,1,year(key_date));
do m=lbound(w) to hbound(w);
	sdt=mdy(m,1,year(mindt));
	w[m]=max(0, min(date_2,intnx('month',sdt,0,'end'))-max(mindt,sdt,date_1)+1);
end;
drop m sdt mindt;
run;

proc print data=want; run;

quickbluefish_0-1744420957655.png

 

 

 

Jeff_DOC
Pyrite | Level 9

Totally understand the confusion. I'm not the best at explaining it. Sorry. I really appreciate your time and expertise.

Tom
Super User Tom
Super User

Easier if you make a TALL dataset instead.

First let's convert your listing into an actual dataset. (and display the date with an unambiguous style)

data have;
  input KEY_ID (KEY_DATE DATE_1 DATE_2) (:mmddyy.);
  format KEY_DATE DATE_1 DATE_2 yymmdd10.;
cards;
3 1/1/2024 12/16/2023  4/20/2024
4 1/1/2024 12/31/2023  2/16/2024
5 2/1/2024  1/30/2024  4/15/2024
6 1/1/2024 12/02/2023  2/12/2024
;

Now just loop over the months between DATE_1 and DATE_2.

data tall ;
  set have;
  do offset=0 to intck('month',date_1,date_2);
    month = intnx('month',date_1,offset);
    days = min(date_2,intnx('month',month,0,'e')) - max(month,date_1) + 1;
    output;
  end;
  format month yymm7.;
run;

You might a REPORT that looks a lot like your request using the new MONTH variable as an ACROSS variable.

proc report ;
  columns key_id key_date date_1 date_2 days,month ;
  define key_id / group;
  define key_date / group;
  define date_1 / group;
  define date_2 / group;
  define month / across ' ';
run;

Result

Tom_0-1744420844480.png

Or you could use PROC TRANPOSE to make a dataset in that (hard to work with) wide format.

proc transpose data=tall out=wide(drop=_name_) prefix=Y;
  by key_id key_date date_1 date_2;
  id month;
  var days;
run;

Result

Tom_1-1744420979648.png

 

 

PaigeMiller
Diamond | Level 26

@Tom wrote:

Easier if you make a TALL dataset instead.

 


Great advice from @Tom. I hope @Jeff_DOC will heed this advice, not just in this example, but in future problems as well.

--
Paige Miller
dxiao2017
Lapis Lazuli | Level 10

Base on your answer, my version (for creating dataset with days) is like this (the proc sort and proc report steps and the results are the same). Although I used more steps and statements, I think the logic is easier to understand for new users, especially who is not good at math, do not remember many functions, and not get used to more advanced techniques such as array and nested loops.

data test;
   input id iddate date1 date2;
   informat iddate date: mmddyy.;
   format iddate date: date9.;
   datalines;
3 1/1/2024 12/16/2023 4/20/2024
4 1/1/2024 12/31/2023 2/16/2024
5 2/1/2024 1/30/2024 4/15/2024
6 1/1/2024 12/2/2023 2/12/2024
;
run;
data want;
   set test;
   do i=0 to intck('month',date1,date2);
      month=intnx('month',date1,i);
      days=min(date2,
               intnx('month',month,0,'e')
               )
           -max(month,date1)
           +1;
      output;
   end;
   format month yymm7.;
run;
proc print data=want;run;
/***my version***/ /*calculate months*/ data mthcalc; set test; do i=0 to intck('month',date1,date2); month=intnx('month',date1,i); output; end; format month yymm7.; run; /*calculate days*/ data daycalc; set mthcalc; /*determine start and end day of the 1st month*/ if i=0 then do; startday=date1; endday=intnx('month',date1,0,'end'); end; /*determine start and end day of the last month*/ else if i=intck('month',date1,date2) then do; startday=intnx('month',date2,0); endday=date2; end; /*determine start and end day of other months*/ else do; startday=mdy(month(month),01,year(month)); endday=intnx('month',startday,0,'end'); end; /*calcualte days of each month*/ days=endday-startday+1; format startday endday date9.; run; proc print data=daycalc;run;
/***transpose and report steps***/ proc transpose data=daycalc out=finalrpt (drop=_name_); by id iddate date1 date2; id month; var days; run; proc print data=finalrpt noobs;run; proc report data=daycalc; columns id iddate date1 date2 days,month; define id/group; define iddate/group; define date1/group; define date2/group; define month/across ' '; run;

dxiao2017_0-1744650162551.png

dxiao2017_1-1744650292338.png

dxiao2017_3-1744650482645.png

PaigeMiller
Diamond | Level 26

@dxiao2017 wrote:

Base on your answer, my version (for creating dataset with days) is like this (the proc sort and proc report steps and the results are the same). Although I used more steps and statements, I think the logic is easier to understand for new users, especially who is not good at math, do not remember many functions, and not get used to more advanced techniques such as array and nested loops.

 


I disagree that the logic you used is easier to understand than what @Tom showed. In my work, I see people all the time trying to do something the hard way, they put a huge amount of effort into it (like I assume you did) and they write a lot of code, when this isn't necessary, and there are much easier ways to get the desired output.

 

And so in this case, @Tom has shown how to get the output with a lot less code. He even gave good advice to people who have this or similar problems, he said "Easier if you make a TALL dataset instead." Also, Tom has not used arrays or nested loops, and he has shown very useful methods (like use TALL data sets and PROC REPORT) that SAS programmers ought to learn early in their career. SAS has done the hard work for you, so you don't have to work hard.

 

By the way, @dxiao2017 , maybe you were replying to the code from @quickbluefish, but your actual post says you are replying to @Tom .

--
Paige Miller
dxiao2017
Lapis Lazuli | Level 10

Hi Paige, thanks a lot for your feedback and valuable comments!

 

The reasons I said my logic is easier to understand is as follows: 1) @Tom use one single statement to calculate the days for each month, like this:

days=min(date2, intnx('month',month,0,'e') )-max(month,date1)+1;

I use two data steps (one for months and the other one for days, in my last thread) with many more statements to calculate the days for each month, like this:

/*determine start and end day of the 1st month*/
if i=0 then do;
startday=date1;
endday=intnx('month',date1,0,'end');
end;
/*determine start and end day of the last month*/
else if i=intck('month',date1,date2)
then do;
startday=intnx('month',date2,0);
endday=date2;
end;
/*determine start and end day of other months*/
else do;
startday=mdy(month(month),01,year(month));
endday=intnx('month',startday,0,'end');
end;
/*calcualte days of each month*/
days=endday-startday+1;

The logic of my version is: explicitly show every single step of thinking with one SAS statement or step. And my steps of thinking are shown in the notes in the code above. 2) For me, who is not good at math, writing code this way saves a lot of more complex thinking (and therefore consumes more coding steps), i.e., when you think a basic and single step you explicitly write it out. @Tom combined several thinking steps using one statement which integrates and presents different calculations for the days of the 1st month, the last month, and other months. Maybe what @Tom did is the common way of thinking and common practice for people who have science and math background, but for people who is not that good at math, perhaps my steps is easier to understand and follow. Also, if this code needs to be developed to macros, I think my steps is necessary. Nevertheless, I will learn to combine detailed and basic steps and statements using more advanced and comprehensive thinking and coding techniques later on.

 

I agree with you that @Tom created a tall dataset at first, and then base on which calculated the days, i.e., creating a tall dataset makes things easier, comparing to @quickbluefish 's solution which at first uses an array and then a loop with advanced functions. So I followed @Tom 's steps to write my version. I have just started to learn array and nested loops and advanced functions and did not follow @quickbluefish 's solution. I will learn those techniques later on. The other reason I followed @Tom 's solution is that the proc sort and proc report steps help me review what I learnt, it is a good example and good practice for me.

 

By talking about array and (nested) loops I was indeed refer to @quickbluefish 's solution. You are right about this. Thanks again @PaigeMiller for your valuable comments, which offers me ideas on what aspects I need to improve and need to think differently.

dxiao2017
Lapis Lazuli | Level 10

I forgot to say that in a lot of scenarios, when you need to create dataset as your final product, my way of creating that tall dataset with the startday and endday columns, perhaps has more advantages, in that this dataset has better traceability, is easier to compare with the raw data, and to validate.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 2964 views
  • 7 likes
  • 6 in conversation