Hello,
I have this type of data:
ID start_study end_study
1 01APR2010 20OCT2010
2 01APR2010 01APR2014
3 01Nov2010 20OCT2015
4 01APR2010 20OCT2016
5 01APR2010 20APR2010
.......
The period I am looking in is from 01APR2010 till 01APR2017. I want to create a variable for each month: 84 varibles.
The start_study is the date when a person joined a group and end_study is the date when he left it. I want to calculate for each person how many days per month a person belonged for a group, for example:
ID=1 in first month (april 2010) was in group for 30 days, in second month (may 2010) was for 31 days , ...., in seventh month (octorber 2010) was for 20 days and after would be 0 cause he left the group.
what would be the best way to do this?
thank you
Instead of creating variables for months, it's better to write a record per month. Never keep data (months) in structure (variables/columns).
data have;
input ID start_study :date9. end_study :date9.;
format start_study end_study date9.;
cards;
1 01APR2010 20OCT2010
2 01APR2010 01APR2014
3 01Nov2010 20OCT2015
4 01APR2010 20OCT2016
5 01APR2010 20APR2010
;
run;
data want (keep=id year month yearmonth days);
set have;
length yearmonth $6;
do until (start_study > end_study);
year = year(start_study);
month = month(start_study);
yearmonth = put(year,z4.) !! put(month,z2.);
days = min(intnx('month',start_study,0,'end'),end_study) - start_study + 1;
output;
start_study = intnx('month',start_study,1,'begin');
end;
run;
Also note how I presented example data in a data step. This allows your assumed helpers to recreate your dataset with a simple copy/paste and run. See this as a simple courtesy toward those you want to help you.
In case you absolutely need columns per month, yearmonth can be used in a proc transpose to name the columns.
Instead of creating variables for months, it's better to write a record per month. Never keep data (months) in structure (variables/columns).
data have;
input ID start_study :date9. end_study :date9.;
format start_study end_study date9.;
cards;
1 01APR2010 20OCT2010
2 01APR2010 01APR2014
3 01Nov2010 20OCT2015
4 01APR2010 20OCT2016
5 01APR2010 20APR2010
;
run;
data want (keep=id year month yearmonth days);
set have;
length yearmonth $6;
do until (start_study > end_study);
year = year(start_study);
month = month(start_study);
yearmonth = put(year,z4.) !! put(month,z2.);
days = min(intnx('month',start_study,0,'end'),end_study) - start_study + 1;
output;
start_study = intnx('month',start_study,1,'begin');
end;
run;
Also note how I presented example data in a data step. This allows your assumed helpers to recreate your dataset with a simple copy/paste and run. See this as a simple courtesy toward those you want to help you.
In case you absolutely need columns per month, yearmonth can be used in a proc transpose to name the columns.
Thanks! It works fine for small data size, but mine is more than 7 million of observations (sorry that i didnt mentioned that before!)
@viollete wrote:
Thanks! It works fine for small data size, but mine is more than 7 million of observations (sorry that i didnt mentioned that before!)
And?
7 Million * 38 bytes (4*8 for the numerics and 6 for yearmonth) equates to roughly 270 MB, which is puny.
And you can even reduce the number of bytes needed:
length year month days 3;
as the small numbers don't need more precision.
And keep in mind that forcing SAS to have columns for all months when most of the times the values will be missing is a colossal waste of resources.
At the end it work (it took some time to calculate). Thanks
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.