Thank you @SASJedi !
I changed the time period including the months. So, I am having two time periods: (i) March to May 2018 and (ii) June to August 2018 and trying to find if the clients were newly enrolled from (i) to (ii), exited from (i) to (ii), and retained in (ii). I would be grateful if you could review the code.
I am a beginner with SAS, so apologies on silly questions.
proc sql;
create table prepost_q1 as
select id
, SUM(year=2018 and (month=3 or month=5))=0 and SUM(year=2018 and (month=6 or month=8))>0 as preq2_newenrol
, SUM(year=2018 and (month=3 or month=5))>0 and SUM(year=2018 and (month=6 or month=8))=0 as preq2_exited
, SUM(year=2018 and (month=3 or month=5))>0 and SUM(year=2019 and (month=6 or month=8))>0 as preq2_retained
from work.prepost
group by id
;
quit;
As the question evolved here, the name of your input data set keeps changing as do the input data set vales. It makes it hard to refined the answer from previous work. So I'm going to provide both sample data and commented code here to help you apply this to your actual data.
This code creates the input data set:
data prepost;
input id$ year month;
datalines;
1 2018 2
1 2018 3
1 2018 4
1 2019 2
1 2019 3
1 2019 4
2 2018 8
2 2018 9
2 2018 10
3 2018 3
3 2018 4
3 2018 5
3 2018 8
3 2018 9
3 2018 10
;
Which looks like this:
| id | year | month |
|---|---|---|
| 1 | 2018 | 2 |
| 1 | 2018 | 3 |
| 1 | 2018 | 4 |
| 1 | 2019 | 2 |
| 1 | 2019 | 3 |
| 1 | 2019 | 4 |
| 2 | 2018 | 8 |
| 2 | 2018 | 9 |
| 2 | 2018 | 10 |
| 3 | 2018 | 3 |
| 3 | 2018 | 4 |
| 3 | 2018 | 5 |
| 3 | 2018 | 8 |
| 3 | 2018 | 9 |
| 3 | 2018 | 10 |
This code does the summarization:
proc sql;
create table prepost_q1 as
select id
/* Each logical expression produces a 1 if true, 0 if false */
,sum( month in (3,5))=0 /* Not enrolled first */
AND
sum( month in (6,8))>0 /* Enrolled second */ as preq2_newenrol
,sum( month in (3,5))>0 /* Enrolled first */
AND
sum( month in (6,8))=0 /* Not enrolled second */ as preq2_exited
,sum( month in (3,5))>0 /* Enrolled first */
AND
sum( month in (6,8))>0 /* Enrolled second */ as preq2_retained
from work.prepost
where year=2018
group by id
;
quit;
Producing this result:
| id | preq2_newenrol | preq2_exited | preq2_retained |
|---|---|---|---|
| 1 | 0 | 1 | 0 |
| 2 | 1 | 0 | 0 |
| 3 | 0 | 0 | 1 |
Thanks @SASJedi ! The code worked for my data. Just to confirm, although we have multiple observations for each individual, the report is producing the values (1, 0) for unique individuals, right?
In the result table in my example, there is only 1 row produced for each unique ID value. In the SQL query SELECT statement, the only column note defined by a summary function is ID, and ID is the GROUP BY value. So no, this query should never produce more than one row per ID.
Hi @SASJedi,
I wanted to check which months the logic was picking up, by including both year and month variables in the code as:
proc sql;
create table prepost_q1 as
select id, year, month
/* Each logical expression produces a 1 if true, 0 if false */
,sum( month in (3,5))=0 /* Not enrolled first */
AND
sum( month in (6,8))>0 /* Enrolled second */ as preq2_newenrol
,sum( month in (3,5))>0 /* Enrolled first */
AND
sum( month in (6,8))=0 /* Not enrolled second */ as preq2_exited
,sum( month in (3,5))>0 /* Enrolled first */
AND
sum( month in (6,8))>0 /* Enrolled second */ as preq2_retained
from work.prepost
where year=2018
group by id
;
quit;What I saw was the logic was assigning values to unwanted observations as well. For an example, month=2 was assigned 1 for preq2_exited for id=1. Similarly, months 9 and 10 were assigned the value of 1 for id = 2 and 3. We were looking for data between the time periods, May to June 2018 and July to August 2018 and do not want to use other months as they will be present in different time periods like September to Nov 2018 and so on. How could I solve this issue?
The log file was giving this message:
"NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.PREPOST_Q1TT created, with 12 rows and 6 columns.
"
The table looks as:
Is there anything that I missed or incorrectly wrote the code?
Many thanks!
Any column in the SELECT list that is not being computed with a summary function must also be included in the GROUP BY clause, or you will get remerged results that are probably not what you desire.
Got it, thanks @SASJedi ! One quick question. Do "sum( month in (3,5))>0" and "sum( month in (3,4,5))" mean the same thing?
>Do"sum( month in (3,4,5))>0" and "sum(month in (3,4,5))" mean the same thing?
No, they don't. The first will produce only a 1 (true) or a 0 (false) no matter how many rows were in the list, the second will provide a count of number of rows in the group that met the criteria: Consider this demo data:
data demo;
ID=1;
month=1; output;
month=2; output;
do ID=2 to 3;
do Month=1 to 6 by ID;
output;
end;
end;
run;
| Obs | ID | month |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 2 | 3 |
| 5 | 2 | 5 |
| 6 | 3 | 1 |
| 7 | 3 | 4 |
This SQL query demonstrates the difference:
| ID | sum(month in (3,4,5)) |
sum(month in (3,4,5))>0 |
|---|---|---|
| 1 | 0 | 0 |
| 2 | 2 | 1 |
| 3 | 1 | 1 |
This table kind of shows what's going on under the covers:
| ID | month | In345 | sum(month in (3,4,5)) | sum(month in (3,4,5))>0 | Comment |
| 1 | 1 | 0 | |||
| 1 | 2 | 0 | 0 | 0 | No rows met the criteria for ID1 |
| 2 | 1 | 0 | |||
| 2 | 3 | 1 | |||
| 2 | 5 | 1 | 2 | 1 | 2 rows met the criteria for ID2 |
| 3 | 1 | 0 | |||
| 3 | 4 | 1 | 1 | 1 | 1 row met the criteria for ID3 |
That was crystal clear! Thanks again, @SASJedi ! I encountered another challenge. I have two time periods: (i) Sep, Oct, Nov 2018 and (ii) Dec 2018, Jan, Feb 2019. I would also like to see the numbers of unique clients newly enrolled, exited, and retained from time period (i) to (ii). I used the following sample data:
data prepost;
input id$ year month;
datalines;
1 2018 10
1 2018 11
1 2018 12
1 2019 1
1 2019 2
2 2019 10
2 2019 11
2 2019 12
3 2018 3
3 2018 4
3 2018 5
3 2019 1
3 2019 12
3 2020 1
3 2020 2
4 2018 9
4 2018 10
4 2018 11
4 2018 12
4 2019 1
4 2019 2
5 2018 12
5 2019 1
5 2019 2
5 2019 3
5 2019 4
5 2019 5
5 2019 6
5 2019 7
5 2019 8
5 2019 9
5 2019 10
5 2019 11
5 2019 12
5 2020 1
5 2020 2
6 2019 1
6 2019 2
6 2019 3
6 2019 4
6 2019 5
6 2019 6
6 2019 7
6 2019 8
6 2019 9
6 2019 10
6 2019 11
6 2019 12
6 2020 1
6 2020 2
6 2020 3
6 2020 4
6 2020 5
6 2020 6
6 2020 7
6 2020 8
6 2020 9
6 2020 10
6 2020 11
6 2020 12
6 2021 1
6 2021 2
6 2021 3
6 2021 4
6 2021 5
6 2021 6
6 2021 7
6 2021 8
6 2021 19
6 2021 10
6 2021 11
6 2021 12
6 2022 1
6 2022 2
;I modified the code you wrote to get the numbers but I could not get the desired result. The code is:
proc sql;
create table prepost_q6 as
select id
/* Each logical expression produces a 1 if true, 0 if false */
,sum( month in (9,10,11))=0 AND year=2018 /* Not enrolled first */
AND
(sum( month in (12))>0 AND year=2018
OR
sum ( month in (1,2))>0 AND year=2019) /* Enrolled second */ as preq6_newenrol
,sum( month in (9,10,11))>0 AND year=2018 /* Enrolled first */
AND
(sum( month in (12))=0 AND year=2018
OR
sum ( month in (1,2))=0 AND year=2019) /* Not enrolled second */ as preq6_exited
,sum( month in (9,19,11))>0 /* Enrolled first */
AND
(sum( month in (12))>0 AND year=2018
OR
sum ( month in (1,2))>0 AND year=2019) /* Enrolled second */ as preq6_retained
from work.prepost
group by id
;
quit; Could you please help me in getting the code correct to have the numbers of unique clients newly enrolled, exited, and retained from time period (i) to (ii)?
Thank you!
@SASJedi, I am waiting on your suggestions. I appreciate your help!
Normally, it is frowned upon to post the same question more than once. Can someone please merge this with @SAS_SB other thread on this matter?
I tried to merge with another thread, but was unable to do so. Is there a way we can merge threads? If you have any idea, please share. It would be helpful. Thanks.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.