BookmarkSubscribeRSS Feed
mariko5797
Pyrite | Level 9

I want to count the number of events that occur for each subject, in each dose, for each symptom. An event is defined as a symptom severity >0 (none). A symptom that occurs in consecutive days would be counted as one symptom.

For example, subject A experiences a headache Day 1, Day 2, and Day 4. In this case, there are two events: Day 1 and Day 2 are one event (symptom is continuous) and Day 4 is one event. Subject A also experiences vomiting on Days 2, 3, and 4. This is one event since the symptom was continuous (i.e., there was no break in between the symptom being experienced). 

data have;
 length id $1. dose $2. dtc1 dtc2 dtc3 dtc4 dtc5 $9.;
 input 	id $ dose $ 
		dtc1 $ dtc2 $ dtc3 $ dtc4 $ dtc5 $ 
		hea1 hea2 hea3 hea4 hea5 vom1 vom2 vom3 vom4 vom5 @@;
 cards;
A	01
01JAN2022	02JAN2022	03JAN2022	04JAN2022	05JAN2022
1	2	0	1	0	0	1	2	3	0
A	02
10JAN2022	11JAN2022	12JAN2022	13JAN2022	14JAN2022
0	0	1	3	1	0	0	2	0	0
B	01
01JAN2022	02JAN2022	03JAN2022	04JAN2022	05JAN2022
0	0	0	0	0	0	0	1	2	1
B	02
10JAN2022	11JAN2022	12JAN2022	13JAN2022	14JAN2022
0	0	0	0	0	0	2	0	0	1
;
run;
/*id=Subject identifier*/
/*dose=Dose number*/
/*dtc=Date (character format)*/
/*hea=Headache symptom*/
/*vom=Vomiting symptom*/
/*Note: DTC1 is the date for HEA1 and VOM1; DTC2 is the date for HEA2 and VOM2; etc.*/

The final table should be something like this:

data want;
 input symp $ n p events @@;
 cards;
hea	1	50	3
vom	2	100	5
;
run;
/*symp=Symptom experienced*/
/*n=Number of subjects experiencing event*/
/*p=Percentage of subjects experiencing event*/
/*events=Total number of events experienced*/

My thought process was to transpose and create a binary 1=symptom, 0=no symptom. Then I was to check (a) remove no symptom and (b) check when next date ^= previous date + 1. However, I am unsure how to make a counter for this. I have about 15 symptoms and 7 days post each dose I need to go through.

 

Edit: Made Subject B dates consecutive for dose 01.

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

Just to start things off, I assume your sample data should be something like this

 

data have;
 length id $1. dose $2. dtc1 dtc2 dtc3 dtc4 dtc5 $9.;
 input  id $ dose $ 
  dtc1 $ dtc2 $ dtc3 $ dtc4 $ dtc5 $ 
  hea1 hea2 hea3 hea4 hea5 vom1 vom2 vom3 vom4 vom5 @@;
 cards;
A 01 01JAN2022 02JAN2022 03JAN2022 04JAN2022 05JAN2022 1 2 0 1 0 0 1 2 3 0
A 02 10JAN2022 11JAN2022 12JAN2022 13JAN2022 14JAN2022 0 0 1 3 1 0 0 2 0 0
B 01 01JAN2022 02JAN2022 03JAN2022 06JAN2022 07JAN2022 0 0 0 0 0 0 0 1 2 1
B 02 10JAN2022 11JAN2022 12JAN2022 13JAN2022 14JAN2022 0 0 0 0 0 0 2 0 0 1
;
mariko5797
Pyrite | Level 9
Yes. That's correct. The dates are already in date format in my dataset; I just wasn't sure how to do that with INPUT.
PaigeMiller
Diamond | Level 26

How do we handle the third record, where the DTC values are not consecutive? Or do the DTC value even matter in this calculation? How do we handle the fact that ID = A appears on two lines?

--
Paige Miller
mariko5797
Pyrite | Level 9

The dates themselves do not matter. It only matters if the dates occur one after the other. 

Edit: I just checked the dataset. All the dates are consecutive within a row, so you can ignore the scenario with Subject B where it was not.

PaigeMiller
Diamond | Level 26

@mariko5797 wrote:

The dates themselves do not matter. It only matters if the dates occur one after the other. 


I'm afraid I don't understand how this answers my question, which was: "How do we handle the third record, where the DTC values are not consecutive?"

 

I asked two other questions, which you did not answer at all.

--
Paige Miller
mariko5797
Pyrite | Level 9

My apologies. I just verified that the dates are consecutive within a row for a given subject and dose, i.e., symptoms were checked each day for X days post-dose (5 in my pseudo code).

Same updated code is posted below:

 

data have;
 length id $1. dose $2. dtc1 dtc2 dtc3 dtc4 dtc5 $9.;
 input 	id $ dose $ 
		dtc1 $ dtc2 $ dtc3 $ dtc4 $ dtc5 $ 
		hea1 hea2 hea3 hea4 hea5 vom1 vom2 vom3 vom4 vom5 @@;
 cards;
A	01
01JAN2022	02JAN2022	03JAN2022	04JAN2022	05JAN2022
1	2	0	1	0	0	1	2	3	0
A	02
10JAN2022	11JAN2022	12JAN2022	13JAN2022	14JAN2022
0	0	1	3	1	0	0	2	0	0
B	01
01JAN2022	02JAN2022	03JAN2022	04JAN2022	05JAN2022
0	0	0	0	0	0	0	1	2	1
B	02
10JAN2022	11JAN2022	12JAN2022	13JAN2022	14JAN2022
0	0	0	0	0	0	2	0	0	1
;
run;
/*id=Subject identifier*/
/*dose=Dose number*/
/*dtc=Date (character format)*/
/*hea=Headache symptom*/
/*vom=Vomiting symptom*/
/*Note: DTC1 is the date for HEA1 and VOM1; DTC2 is the date for HEA2 and VOM2; etc.*/

*Final Desired Table - Any Doses;
data want;
 input symp $ n p events @@;
 cards;
hea	1	50	3
vom	2	100	5
;
run;
/*symp=Symptom experienced*/
/*n=Number of subjects experiencing event*/
/*p=Percentage of subjects experiencing event*/
/*events=Total number of events experienced*/

 

How do we handle the third record, where the DTC values are not consecutive? Or do the DTC value even matter in this calculation? How do we handle the fact that ID = A appears on two lines?

1. Third record of non-consecutive DTC removed.

2. DTC values only matter to tell us whether the days are consecutive or not. If a subject were to experience a symptom for Day 1 thru 4, then that is one event. However, if the subject were to experience a symptom for Day 1 thru 2, nothing on Day 3, and again on Day 4, then that would be counted as two events. Does that make sense?

3. Subject A appears twice because they have different doses we are measuring. There are only two doses, so you can ignore that variable if it's easier. I can subset and run through a macro of sorts.

 

 

 

 

PaigeMiller
Diamond | Level 26

@mariko5797 wrote:

My apologies. I just verified that the dates are consecutive within a row for a given subject and dose, i.e., symptoms were checked each day for X days post-dose (5 in my pseudo code).

Same updated code is posted below:

 

data have;
 length id $1. dose $2. dtc1 dtc2 dtc3 dtc4 dtc5 $9.;
 input 	id $ dose $ 
		dtc1 $ dtc2 $ dtc3 $ dtc4 $ dtc5 $ 
		hea1 hea2 hea3 hea4 hea5 vom1 vom2 vom3 vom4 vom5 @@;
 cards;
A	01
01JAN2022	02JAN2022	03JAN2022	04JAN2022	05JAN2022
1	2	0	1	0	0	1	2	3	0
A	02
10JAN2022	11JAN2022	12JAN2022	13JAN2022	14JAN2022
0	0	1	3	1	0	0	2	0	0
B	01
01JAN2022	02JAN2022	03JAN2022	04JAN2022	05JAN2022
0	0	0	0	0	0	0	1	2	1
B	02
10JAN2022	11JAN2022	12JAN2022	13JAN2022	14JAN2022
0	0	0	0	0	0	2	0	0	1
;
run;
/*id=Subject identifier*/
/*dose=Dose number*/
/*dtc=Date (character format)*/
/*hea=Headache symptom*/
/*vom=Vomiting symptom*/
/*Note: DTC1 is the date for HEA1 and VOM1; DTC2 is the date for HEA2 and VOM2; etc.*/

*Final Desired Table - Any Doses;
data want;
 input symp $ n p events @@;
 cards;
hea	1	50	3
vom	2	100	5
;
run;
/*symp=Symptom experienced*/
/*n=Number of subjects experiencing event*/
/*p=Percentage of subjects experiencing event*/
/*events=Total number of events experienced*/

 

How do we handle the third record, where the DTC values are not consecutive? Or do the DTC value even matter in this calculation? How do we handle the fact that ID = A appears on two lines?

1. Third record of non-consecutive DTC removed.

2. DTC values only matter to tell us whether the days are consecutive or not. If a subject were to experience a symptom for Day 1 thru 4, then that is one event. However, if the subject were to experience a symptom for Day 1 thru 2, nothing on Day 3, and again on Day 4, then that would be counted as two events. Does that make sense?

3. Subject A appears twice because they have different doses we are measuring. There are only two doses, so you can ignore that variable if it's easier. I can subset and run through a macro of sorts.


So #1 implies that only consecutive dates will appear in the data from now on. However #2 still seems to imply that days in the DTC may not be consecutive. So, I guess I am still unsure about all of this. And #3 is easy to program since I can ignore it, but then why include a subject twice in your data and then tell me to ignore the second instance. I think there's something here I don't understand too.

--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2180 views
  • 0 likes
  • 3 in conversation