BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
CynthiaWei
Obsidian | Level 7

Hi SAS Pros,

 

I have a dataset like this:

ID starting_date Duration ending_date
1 3/20/17 2 3/21/17
1 3/25/17 10 4/3/17
1 4/1/17 365 3/31/18
2 1/2/17 60 3/2/17
2 1/10/18 5 1/14/17

 

What I want is to:

1. create a series of dummy variables that tell me which month(s) the duration covers, for example, the duration of 3/20/2017-3/22/2017 happened only in March 17, so march_17=1. The duration 4/1/2017-3/31/2018 lasted 365 days and it covered from April 2017 though March 2018, the april_17, may_17,..., until february_18, and march_18 all equals to 1.

What I want is like this:

ID starting_date Duration ending_date January_17 February_17 March_17 April_17 May_17 June_17 July_17 August_17 September_17 October_17 November_17 December_17 January_18 February_18 March_18
1 3/20/17 2 3/21/17     1                        
1 3/25/17 10 4/3/17     1 1                      
1 4/1/17 365 3/31/18       1 1 1 1 1 1 1 1 1 1 1 1
2 1/2/17 60 3/2/17 1 1 1                        
2 1/10/18 5 1/14/17 1                            

 

2. I also want to use proc sql to calculate sum for each dummy by ID (these sum variables are not listed in the table). Could you please advice how can I combine array and proc sql together in calculating sum for each dummy variable?

I really appreciate for any help!

Best regards,

C

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

A wide dataset will be detrimental to all further calculations, so you better sray with a long layout:

data want;
set have;
format month yymmd7.;
month = intnx('month',starting_date,0,'b');
do while (month le ending_date);
  output;
  month = intnx('month',month,1,'b');
end;
keep id month;
run;

You can now easily create counts of months per id, or counts of ids per month.

View solution in original post

9 REPLIES 9
Kurt_Bremser
Super User

A wide dataset will be detrimental to all further calculations, so you better sray with a long layout:

data want;
set have;
format month yymmd7.;
month = intnx('month',starting_date,0,'b');
do while (month le ending_date);
  output;
  month = intnx('month',month,1,'b');
end;
keep id month;
run;

You can now easily create counts of months per id, or counts of ids per month.

CynthiaWei
Obsidian | Level 7

Hi,

Thank you so much! It works very well for my project!

Best regards,

C

CynthiaWei
Obsidian | Level 7

Hi,

I just have a question that why there is d in 'yymmd7.' in the format month statement? month and b are some default functions?

Thank you!

Best,

PaigeMiller
Diamond | Level 26

@CynthiaWei wrote:

Hi SAS Pros,

 

I have a dataset like this:

ID starting_date Duration ending_date
1 3/20/17 2 3/21/17
1 3/25/17 10 4/3/17
1 4/1/17 365 3/31/18
2 1/2/17 60 3/2/17
2 1/10/18 5 1/14/17

 

What I want is to:

1. create a series of dummy variables that tell me which month(s) the duration covers, for example, the duration of 3/20/2017-3/22/2017 happened only in March 17, so march_17=1. The duration 4/1/2017-3/31/2018 lasted 365 days and it covered from April 2017 though March 2018, the april_17, may_17,..., until february_18, and march_18 all equals to 1.

What I want is like this:

ID starting_date Duration ending_date January_17 February_17 March_17 April_17 May_17 June_17 July_17 August_17 September_17 October_17 November_17 December_17 January_18 February_18 March_18
1 3/20/17 2 3/21/17     1                        
1 3/25/17 10 4/3/17     1 1                      
1 4/1/17 365 3/31/18       1 1 1 1 1 1 1 1 1 1 1 1
2 1/2/17 60 3/2/17 1 1 1                        
2 1/10/18 5 1/14/17 1                            

 

2. I also want to use proc sql to calculate sum for each dummy by ID (these sum variables are not listed in the table). Could you please advice how can I combine array and proc sql together in calculating sum for each dummy variable?

I really appreciate for any help!

Best regards,

C


You don't need dummy variables to compute sums by month. You don't need arrays to calculate sums by month. There are very few valid reasons to compute dummy variables in SAS, and very many better ways to handle these matters, such as this, which uses the data set WANT as created in the program above by @Kurt_Bremser to compute the sums:

 

proc summary data=want;
    class month;
    var some_other_variable;
    format month yymm7.;
    output out=sums sum=;
run;

I agree with @Kurt_Bremser, trying to compute these dummy variables will be a much more difficult way to program this analysis.

--
Paige Miller
CynthiaWei
Obsidian | Level 7

Hi,

 

Thank you so much for the code too! One question is that what variable should I put in the var statement? What is this step supposed to do?

 

Best,

RichardDeVen
Barite | Level 11
Are you creating dummy variables for a logistic regression ?
Reeza
Super User
Wouldn't time series data violate the assumptions for logistic regression?
CynthiaWei
Obsidian | Level 7

Hi Guys,

 

Thank you for your code. I have created a dataset like this:

ID Yes_or_No count_of_month month
1 1 9 2017-03
1 1 9 2017-04
1 1 9 2017-05
1 1 9 2017-06
1 1 9 2017-07
1 1 9 2017-08
1 1 9 2017-09
1 1 9 2017-10
1 1 9 2017-11
1 1 9 2017-12
2 1 6 2017-06
2 1 6 2017-07
2 1 6 2017-08
2 1 6 2017-09
2 1 6 2017-10
2 1 6 2017-11
3 0 0 .
3 0 0 .
3 0 0 .
3 0 0 .
4 0 0 .
       
       
       

My next step is to (what I want):

1. Create a new variable called calendar_month with values from 2017-01 to 2018-04 (16 rows for each person)

2. Create another variable called person_permonth to represent the 'month' that the duration covered, for example, person 1 has 2017-03 in the month, then person_permonth=1 for calendar_month=2017-03; person 1 doesn't have 2017-01 and 2017-02, so person_permonth=0 for calendar_month=2017-01 and 2017-02 as follows. By doing this, every person has 16 rows with each row represent one month from 2017-01 to 2018-04. If the person is Yes_or_No=1 and has certain months in the month variable, the corresponding months under calendar_month should be 1, if the person doesn't have certain months in the variable month, the corresponding months under variable calendar_month is 0. If the person is Yes_or_No=0, the every month under calendar_month should be 0 for all 16 rows for that person. Hope I explain it clearly.

 

ID Yes_or_No count_of_month month calendar_month person_permonth
1 1 9   2017-01 0
1 1 9   2017-02 0
1 1 9 2017-03 2017-03 1
1 1 9 2017-04 2017-04 1
1 1 9 2017-05 2017-05 1
1 1 9 2017-06 2017-06 1
1 1 9 2017-07 2017-07 1
1 1 9 2017-08 2017-08 1
1 1 9 2017-09 2017-09 1
1 1 9 2017-10 2017-10 1
1 1 9 2017-11 2017-11 1
1 1 9 2017-12 2017-12 1
1 1 9   2018-01 0
1 1 9   2018-02 0
1 1 9   2018-03 0
1 1 9   2018-04 0
2 1 6   2017-01 0
2 1 6   2017-02 0
2 1 6   2017-03 0
2 1 6   2017-04 0
2 1 6   2017-05 0
2 1 6 2017-06 2017-06 1
2 1 6 2017-07 2017-07 1
2 1 6 2017-08 2017-08 1
2 1 6 2017-09 2017-09 1
2 1 6 2017-10 2017-10 1
2 1 6 2017-11 2017-11 1
2 1 6   2017-12 0
2 1 6   2018-01 0
2 1 6   2018-02 0
2 1 6   2018-03 0
2 1 6   2018-04 0
3 0 0 . 2017-01 0
3 0 0 . 2017-02 0
3 0 0 . 2017-03 0
3 0 0 . 2017-04 0
3 0 0 . 2017-05 0
3 0 0 . 2017-06 0
3 0 0 . 2017-07 0
3 0 0 . 2017-08 0
3 0 0 . 2017-09 0
3 0 0 . 2017-10 0
3 0 0 . 2017-11 0
3 0 0 . 2017-12 0
3 0 0 . 2018-01 0
3 0 0 . 2018-02 0
3 0 0 . 2018-03 0
3 0 0 . 2018-04 0
4 0 0 . 2017-01 0
4 0 0 . 2017-02 0
4 0 0 . 2017-03 0
4 0 0 . 2017-04 0
4 0 0 . 2017-05 0
4 0 0 . 2017-06 0
4 0 0 . 2017-07 0
4 0 0 . 2017-08 0
4 0 0 . 2017-09 0
4 0 0 . 2017-10 0
4 0 0 . 2017-11 0
4 0 0 . 2017-12 0
4 0 0 . 2018-01 0
4 0 0 . 2018-02 0
4 0 0 . 2018-03 0
4 0 0 . 2018-04 0

 

I know it is a little complicated. I appreciate any help with this project!

Best regards,

C

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1947 views
  • 3 likes
  • 5 in conversation