BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
viollete
Calcite | Level 5

Hello,

 

I have this type of data:

ID start_study end_study

1  01APR2010  20OCT2010

2  01APR2010  01APR2014

3  01Nov2010  20OCT2015

4  01APR2010  20OCT2016

5  01APR2010  20APR2010

.......

The period I am looking in is from 01APR2010 till 01APR2017. I want to create a variable for each month: 84 varibles. 

The start_study is the date when a person joined a group and end_study is the date when he left it. I want to calculate for each person how many days per month a person belonged for a group, for example:

ID=1 in first month (april 2010) was in group for 30 days, in second month (may 2010) was for 31 days , ...., in seventh month (octorber 2010) was for 20 days and after would be 0 cause he left the group.

 

what would be the best way to do this?

 

 

thank you

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Instead of creating variables for months, it's better to write a record per month. Never keep data (months) in structure (variables/columns).

data have;
input ID start_study :date9. end_study :date9.;
format start_study end_study date9.;
cards;
1 01APR2010 20OCT2010
2 01APR2010 01APR2014
3 01Nov2010 20OCT2015
4 01APR2010 20OCT2016
5 01APR2010 20APR2010
;
run;

data want (keep=id year month yearmonth days);
set have;
length yearmonth $6;
do until (start_study > end_study);
  year = year(start_study);
  month = month(start_study);
  yearmonth = put(year,z4.) !! put(month,z2.);
  days = min(intnx('month',start_study,0,'end'),end_study) - start_study + 1;
  output;
  start_study = intnx('month',start_study,1,'begin');
end;
run;

Also note how I presented example data in a data step. This allows your assumed helpers to recreate your dataset with a simple copy/paste and run. See this as a simple courtesy toward those you want to help you.

 

In case you absolutely need columns per month, yearmonth can be used in a proc transpose to name the columns.

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

Instead of creating variables for months, it's better to write a record per month. Never keep data (months) in structure (variables/columns).

data have;
input ID start_study :date9. end_study :date9.;
format start_study end_study date9.;
cards;
1 01APR2010 20OCT2010
2 01APR2010 01APR2014
3 01Nov2010 20OCT2015
4 01APR2010 20OCT2016
5 01APR2010 20APR2010
;
run;

data want (keep=id year month yearmonth days);
set have;
length yearmonth $6;
do until (start_study > end_study);
  year = year(start_study);
  month = month(start_study);
  yearmonth = put(year,z4.) !! put(month,z2.);
  days = min(intnx('month',start_study,0,'end'),end_study) - start_study + 1;
  output;
  start_study = intnx('month',start_study,1,'begin');
end;
run;

Also note how I presented example data in a data step. This allows your assumed helpers to recreate your dataset with a simple copy/paste and run. See this as a simple courtesy toward those you want to help you.

 

In case you absolutely need columns per month, yearmonth can be used in a proc transpose to name the columns.

viollete
Calcite | Level 5

Thanks! It works fine for small data size, but mine is more than 7 million of observations (sorry that i didnt mentioned that before!)

Kurt_Bremser
Super User

@viollete wrote:

Thanks! It works fine for small data size, but mine is more than 7 million of observations (sorry that i didnt mentioned that before!)


And?

 

7 Million * 38 bytes (4*8 for the numerics and 6 for yearmonth) equates to roughly 270 MB, which is puny.

And you can even reduce the number of bytes needed:

length year month days 3;

as the small numbers don't need more precision.

 

And keep in mind that forcing SAS to have columns for all months when most of the times the values will be missing is a colossal waste of resources.

viollete
Calcite | Level 5

At the end it work (it took some time to calculate). Thanks

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1411 views
  • 3 likes
  • 2 in conversation