Hello,
I am working on a data set where I have variables for patient id, cycle number (similar to a visit), and pregnancy status. Pregnancy_status = 1 if there was a pregnancy and 0 if no pregnancy occurred. I need to create a variable to indicate the if the pregnancy is the first, second, etc. pregnancy by patient_id. The table below demonstrates what the pregnancy_number variable that I derive should look like.
Patient_id | Cycle | Pregnancy_status | Pregnancy_number |
1 | 1 | 0 | . |
1 | 2 | 1 | 1 |
1 | 3 | 1 | 2 |
2 | 1 | 1 | 1 |
2 | 2 | 0 | . |
3 | 1 | 0 | . |
4 | 1 | 1 | 1 |
5 | 1 | 1 | 1 |
5 | 2 | 1 | 2 |
5 | 3 | 0 | . |
5 | 4 | 1 | 3 |
I would prefer to avoid doing this in SQL, I was thinking that I could use a by statement in a data step, but I don't know how to write a conditional to capture what I need. Any help would be greatly appreciated. Thank you so much for your time
To do that, just separate the counter into a separate variable like this.
proc sort data=have;
by patient_id cycle;
run;
data wanted(drop = Pregnancy_counter);
set have;
by patient_id cycle;
retain Pregnancy_counter .;
if first.Patient_id then Pregnancy_counter = .;
if Pregnancy_status = 1 then do;
Pregnancy_counter = sum(Pregnancy_counter,1);
Pregnancy_number = Pregnancy_counter;
end;
run;
You didn't provide the data step with datalines to create this so I didn't test it. But this is close.
proc sort data=have;
by patient_id cycle;
run;
data wanted;
set have;
by patient_id cycle;
retain Pregnancy_number .;
if first.Patient_id then Pregnancy_number = .;
if Pregnancy_status = 1 then Pregnancy_number = sum(Pregnancy_number,1);
run;
I tested your code, and used the following data lines.
DATA HAVE;
INPUT PATIENT_ID CYCLE PREGNANCY_STATUS;
DATALINES;
1 1 0
1 2 1
1 3 1
2 1 1
2 2 0
3 1 0
4 1 1
5 1 1
5 2 1
5 3 0
5 4 1
;
RUN;
The output produced was close; however, if there was a pregnancy on the first visit, and no pregnancy on the second visit, the value for pregnancy_number would be 1, where it should be missing.
Obs | ID | CYCLE | PREG | Pregnancy_number |
1 | 1 | 1 | 0 | . |
2 | 1 | 2 | 1 | 1 |
3 | 1 | 3 | 1 | 2 |
4 | 2 | 1 | 1 | 1 |
5 | 2 | 2 | 0 | 1 |
6 | 3 | 1 | 0 | . |
7 | 4 | 1 | 1 | 1 |
8 | 5 | 1 | 1 | 1 |
9 | 5 | 2 | 1 | 2 |
10 | 5 | 3 | 0 | 2 |
11 | 5 | 4 | 1 | 3 |
In an attempt to remedy this I added this to your code:
data want;
set have;
by patient_id cycle;
retain Pregnancy_number .;
if first.patient_id then Pregnancy_number = .;
if Pregnancy_status = 1 then Pregnancy_number = sum(Pregnancy_number,1);
if Pregnancy_status = 0 then pregnancy_number = .;
run;
However, this does not work because the count restarts for patient number 5, when it should not for the last observation.
Obs | ID | CYCLE | PREG | Pregnancy_number |
1 | 1 | 1 | 0 | . |
2 | 1 | 2 | 1 | 1 |
3 | 1 | 3 | 1 | 2 |
4 | 2 | 1 | 1 | 1 |
5 | 2 | 2 | 0 | . |
6 | 3 | 1 | 0 | . |
7 | 4 | 1 | 1 | 1 |
8 | 5 | 1 | 1 | 1 |
9 | 5 | 2 | 1 | 2 |
10 | 5 | 3 | 0 | . |
11 | 5 | 4 | 1 | 1 |
Thank you so much for your response. I appreciate it tremendously.
To do that, just separate the counter into a separate variable like this.
proc sort data=have;
by patient_id cycle;
run;
data wanted(drop = Pregnancy_counter);
set have;
by patient_id cycle;
retain Pregnancy_counter .;
if first.Patient_id then Pregnancy_counter = .;
if Pregnancy_status = 1 then do;
Pregnancy_counter = sum(Pregnancy_counter,1);
Pregnancy_number = Pregnancy_counter;
end;
run;
Thank you so much!!!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.