I have data on a year long program and need to count the number of sessions that occurred between certain number of months in the program. Cohorts in the program can start at any time in the year, and the timing of sessions is not prescribed; for example, session 18 might occur in month 7 for one cohort, but month 8 for another cohort. Participants might also miss sessions. In my data, each observation is a participant; I have the start date for each participant, and the dates of each session attended. How can I program the data step to count how many sessions a participant attended between months 1-6, 7-9, and 10-12?
Existing variables:
FirstSessionDate= date of first session attended
Date_S1= date of session 1
Date_S2= date of session 2
... (etc.)
Date_S30= date of session 30
Variables I want:
TS_1= number of sessions attended during months 1-6
TS_2=number of sessions attended during months 7-9
TS_3=number of sessions attended during months 10-12
Some example code I use for counting sessions in general (note, I edited this down, there are a lot of other things between this and surrounding it but they don't pertain to this question):
data want;
set have;
ts=0; *TS: total number of sessions attended;
do number=1 to 30;
if &date_s1 ne "" then ts=ts+1;
end;
run;
Correction to that SAS code example:
if date_s&number. ne "" then ts=ts+1;
While I might be able to program this without seeing the actual data set, it would help if you could provide a small sample of the data, and perhaps just 5 dates. To provide data in a useful form which is working SAS data step code, please follow these examples and instructions. Do not provide data in other ways.
I just added a csv of some sample data to work with in a reply to the original post. I hope that helps.
Some of us refuse to download files from this forum. Which is why I asked for data and gave specific instructions for you to follow.
Arrays are your friend here.
Use INTCK() to find the number of intervals between the first session and the date attended.
Untested as no data was provided, but this should give you the idea.
data want;
set have;
array _dt(30) date_s1 -date_s30;
ts_1=0;ts_2=0;ts_3=0;
do i =1 to dim(_dt);
if _dt(i) ne . then nmonths=intck('month', firstSessionDate, _dt(i), 'C');
if not missing(nmonths) then do;
if nmonths < 6 then ts_1+1;
else if nmonths<9 then ts_2+1;
else if nmonths < 12 then ts_3 +1;
end;
end;
run;
I tried running it but only got 0s when I ran a proc freq on ts_1/ ts_2/ ts_3.
I just added a csv of some sample data to work with in a reply to the original post. I hope that helps.
Works fine for me on your sample data.
This is why it's important to provide data as a data step to ensure the data is an accurate representation of your actual data.
What are the types on your variables?
Post your log otherwise - making sure to strip any raw data.
data have;
infile cards truncover;
informat id $8. firstsessionDate date_s1-date_s30 mmddyy10.;
format firstsessionDate date_s1-date_s30 mmddyy10.;
input ID $ FirstSessionDate Date_S1 Date_S2 Date_S3 Date_S4 Date_S5 Date_S6 Date_S7 Date_S8 Date_S9 Date_S10 Date_S11 Date_S12 Date_S13 Date_S14 Date_S15 Date_S16 Date_S17 Date_S18 Date_S19 Date_S20 Date_S21 Date_S22 Date_S23 Date_S24 Date_S25 Date_S26 Date_S27 Date_S28 Date_S29 Date_S30;
cards;
example1 4/15/2021 4/15/2021 4/22/2021 5/6/2021 5/13/2021 5/20/2021 5/27/2021 6/3/2021 6/10/2021 6/17/2021 6/24/2021 7/1/2021 7/8/2021 7/15/2021 7/22/2021 7/29/2021 8/19/2021 9/9/2021 10/23/2021 10/21/2021 12/2/2021 12/23/2021 1/13/2022 2/3/2022 2/24/2022 3/17/2022 4/7/2022
example2 1/13/2021 1/21/2021 1/20/2021 1/13/2021 2/3/2021 2/10/2021 2/17/2021 3/3/2021 3/10/2021 3/24/2021 3/31/2021 4/7/2021 4/14/2021 4/28/2021 5/12/2021 6/9/2021 6/23/2021 7/7/2021 7/21/2021 8/4/2021 9/1/2021 9/15/2021 10/13/2021 10/27/2021 11/10/2021
example3 10/14/2020 10/14/2020 10/28/2020 11/11/2020 11/18/2020 11/25/2020 12/2/2020 12/9/2020 12/16/2020 12/23/2020 12/30/2020 1/13/2021 2/12/2021 3/8/2021 2/24/2021 1/9/2022 4/21/2021 7/15/2021 6/16/2021 7/14/2021 1/13/2022 9/8/2021 10/6/2021 12/1/2021 12/29/2021 1/26/2022 2/23/2022
example4 10/21/2020 10/21/2020 10/28/2020 11/4/2020 11/11/2020 11/18/2020 12/2/2020 12/9/2020 12/16/2020 12/23/2020 12/30/2020 1/6/2021 1/13/2021 1/20/2021 1/27/2021 2/18/2021 4/4/2021 4/19/2021 5/4/2021 5/19/2021 6/3/2021 6/18/2021 7/3/2021 7/18/2021 8/2/2021
example5 10/5/2020 10/12/2021 10/5/2020 10/26/2020 11/2/2020 11/9/2020 11/16/2020 11/30/2020 12/7/2020 12/14/2020 12/21/2020 12/28/2020 1/11/2021 1/30/2021 2/11/2021 2/23/2021 3/7/2021 3/19/2021 4/12/2021 4/24/2021 5/6/2021 5/18/2021 5/30/2021 6/11/2021 6/23/2021
example6 2/1/2021 2/1/2021 2/15/2021 2/22/2021 3/1/2021 3/8/2021 3/15/2021 3/22/2021 3/29/2021 4/5/2021 4/12/2021 4/19/2021 4/26/2021 5/3/2021 5/10/2021 5/17/2021 5/27/2021 6/6/2021 6/16/2021 6/26/2021 7/6/2021 7/16/2021 1/13/2022 8/5/2021 8/15/2021 8/25/2021 9/4/2021 1/15/2022 9/24/2021
example7 3/17/2021 3/17/2021 3/24/2021 3/31/2021 4/7/2021 5/6/2021 4/21/2021 4/28/2021 5/5/2021 5/19/2021 6/2/2021 6/9/2021 6/16/2021 6/23/2021 6/30/2021 8/5/2021 9/10/2021 9/28/2021 10/16/2021 11/3/2021 12/23/2021 12/27/2021 1/14/2022 2/1/2022 2/19/2022 3/9/2022
;
run;
data want;
set have;
array _dt(30) date_s1 -date_s30;
ts_1=0;ts_2=0;ts_3=0;
do i =1 to dim(_dt);
if _dt(i) ne . then nmonths=intck('month', firstSessionDate, _dt(i), 'C');
if not missing(nmonths) then do;
if nmonths < 6 then ts_1+1;
else if nmonths<9 then ts_2+1;
else if nmonths < 12 then ts_3 +1;
end;
end;
run;
Since I work with health data I cannot provide actual data, but I prepared the following csv with some sample data to work with, including some of the conundrums that come up in this data set. You will notice that some sessions are missed throughout the data, the participants all have different start dates and appear to be in cohorts that ran at different times. Since make-up sessions are allowed, the dates are not universally in ascending order across the sessions, including for session 1, which is sometimes missed entirely and sometimes made-up on a date later than the first session they actually attended.
First step: make sure your "dates" are SAS date values. Not something like 8, as in "month 8" the way your discussion reads. Note a random number that you read as a date like 010223 but a numeric value with a SAS date format that is appropriate.
Then it is possible to discuss date ranges and things like crossing year boundaries when needed.
The fact that you include this line of code comparing a "date" to a character values means that you likely don't have such dates yet.
Not mention that would repeatedly compare one fixed value 30 times. The macro variable &date resolves when the step is compiled and doesn't change.
if &date_s1 ne "" then ts=ts+1;
If you have multiple related variables to do something with then an ARRAY would be the tool likely but without some actual explicit values of the Date_s1 and related variables I am not going to attempt to guess code.
There are some replies I made to the original post correcting that date_s1 and providing some sample data. The dates are in date format already.
@halkyos406 wrote:
There are some replies I made to the original post correcting that date_s1 and providing some sample data. The dates are in date format already.
No, they are not in 'date format'. CSV files are text. Until rules are provided we do not know what you have and I repeat my concern about your previous comparison with an =' ' and whether your values are actually dates.
Here is my guess at your data. Note the data step is the way to share data so we don't have to guess as to your actual variable types. I used the DATE9. format just because.
data have; infile datalines dlm=',' dsd; informat id $10. FirstSessionDate Date_S1-Date_S30 mmddyy10.; input ID FirstSessionDate Date_S1 Date_S2 Date_S3 Date_S4 Date_S5 Date_S6 Date_S7 Date_S8 Date_S9 Date_S10 Date_S11 Date_S12 Date_S13 Date_S14 Date_S15 Date_S16 Date_S17 Date_S18 Date_S19 Date_S20 Date_S21 Date_S22 Date_S23 Date_S24 Date_S25 Date_S26 Date_S27 Date_S28 Date_S29 Date_S30; format FirstSessionDate Date_S1-Date_S30 date9.; datalines; example1,4/15/2021,4/15/2021,4/22/2021,,5/6/2021,5/13/2021,5/20/2021,5/27/2021,6/3/2021,6/10/2021,6/17/2021,6/24/2021,7/1/2021,7/8/2021,7/15/2021,7/22/2021,7/29/2021,8/19/2021,9/9/2021,10/23/2021,10/21/2021,,12/2/2021,12/23/2021,1/13/2022,2/3/2022,2/24/2022,3/17/2022,4/7/2022,, example2,1/13/2021,1/21/2021,1/20/2021,1/13/2021,2/3/2021,2/10/2021,2/17/2021,,3/3/2021,3/10/2021,,3/24/2021,3/31/2021,4/7/2021,4/14/2021,,4/28/2021,5/12/2021,,6/9/2021,6/23/2021,7/7/2021,7/21/2021,8/4/2021,,9/1/2021,9/15/2021,,10/13/2021,10/27/2021,11/10/2021 example3,10/14/2020,10/14/2020,,10/28/2020,,11/11/2020,11/18/2020,11/25/2020,12/2/2020,12/9/2020,12/16/2020,12/23/2020,12/30/2020,,1/13/2021,2/12/2021,3/8/2021,2/24/2021,1/9/2022,4/21/2021,7/15/2021,6/16/2021,7/14/2021,1/13/2022,9/8/2021,10/6/2021,,12/1/2021,12/29/2021,1/26/2022,2/23/2022 example4,10/21/2020,10/21/2020,10/28/2020,11/4/2020,11/11/2020,11/18/2020,,12/2/2020,12/9/2020,12/16/2020,12/23/2020,12/30/2020,1/6/2021,1/13/2021,1/20/2021,1/27/2021,,2/18/2021,,,4/4/2021,4/19/2021,5/4/2021,5/19/2021,6/3/2021,6/18/2021,7/3/2021,7/18/2021,8/2/2021,, example5,10/5/2020,10/12/2021,10/5/2020,,10/26/2020,11/2/2020,11/9/2020,11/16/2020,,11/30/2020,12/7/2020,12/14/2020,12/21/2020,12/28/2020,,1/11/2021,,1/30/2021,2/11/2021,2/23/2021,3/7/2021,3/19/2021,,4/12/2021,4/24/2021,5/6/2021,5/18/2021,5/30/2021,6/11/2021,6/23/2021, example6,2/1/2021,,2/1/2021,2/15/2021,2/22/2021,3/1/2021,3/8/2021,3/15/2021,3/22/2021,3/29/2021,4/5/2021,4/12/2021,4/19/2021,4/26/2021,5/3/2021,5/10/2021,5/17/2021,5/27/2021,6/6/2021,6/16/2021,6/26/2021,7/6/2021,7/16/2021,1/13/2022,8/5/2021,8/15/2021,8/25/2021,9/4/2021,1/15/2022,9/24/2021, example7,3/17/2021,3/17/2021,3/24/2021,3/31/2021,4/7/2021,5/6/2021,4/21/2021,4/28/2021,5/5/2021,,5/19/2021,,6/2/2021,6/9/2021,6/16/2021,6/23/2021,6/30/2021,,8/5/2021,,9/10/2021,9/28/2021,10/16/2021,11/3/2021,12/23/2021,,12/27/2021,1/14/2022,2/1/2022,2/19/2022,3/9/2022 ;
Your example data has FirstSessionDate values that match the values of other Date_sXX variables. Are those other Date_sx values to be counted?
I also see in example2 that you have session 1 and 2 occurring after session 3. That seems kind of unlikely given the limited information provided.
Since your dates do not occur on the first of the month then you may also have to describe what "month" means.
If the first session is on 31MAR2021 and another session occurs on 01APR2021 is that supposed to be 1 month, the month boundary, or 0 as that is less than the same day of the month in the later date than the first?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.