Hello,
I am working on a data set where I have variables for patient id, cycle number (similar to a visit), and pregnancy status. Pregnancy_status = 1 if there was a pregnancy and 0 if no pregnancy occurred. I need to create a variable to indicate the if the pregnancy is the first, second, etc. pregnancy by patient_id. The table below demonstrates what the pregnancy_number variable that I derive should look like.
Patient_id | Cycle | Pregnancy_status | Pregnancy_number |
1 | 1 | 0 | . |
1 | 2 | 1 | 1 |
1 | 3 | 1 | 2 |
2 | 1 | 1 | 1 |
2 | 2 | 0 | . |
3 | 1 | 0 | . |
4 | 1 | 1 | 1 |
5 | 1 | 1 | 1 |
5 | 2 | 1 | 2 |
5 | 3 | 0 | . |
5 | 4 | 1 | 3 |
I would prefer to avoid doing this in SQL, I was thinking that I could use a by statement in a data step, but I don't know how to write a conditional to capture what I need. Any help would be greatly appreciated. Thank you so much for your time
To do that, just separate the counter into a separate variable like this.
proc sort data=have;
by patient_id cycle;
run;
data wanted(drop = Pregnancy_counter);
set have;
by patient_id cycle;
retain Pregnancy_counter .;
if first.Patient_id then Pregnancy_counter = .;
if Pregnancy_status = 1 then do;
Pregnancy_counter = sum(Pregnancy_counter,1);
Pregnancy_number = Pregnancy_counter;
end;
run;
You didn't provide the data step with datalines to create this so I didn't test it. But this is close.
proc sort data=have;
by patient_id cycle;
run;
data wanted;
set have;
by patient_id cycle;
retain Pregnancy_number .;
if first.Patient_id then Pregnancy_number = .;
if Pregnancy_status = 1 then Pregnancy_number = sum(Pregnancy_number,1);
run;
I tested your code, and used the following data lines.
DATA HAVE;
INPUT PATIENT_ID CYCLE PREGNANCY_STATUS;
DATALINES;
1 1 0
1 2 1
1 3 1
2 1 1
2 2 0
3 1 0
4 1 1
5 1 1
5 2 1
5 3 0
5 4 1
;
RUN;
The output produced was close; however, if there was a pregnancy on the first visit, and no pregnancy on the second visit, the value for pregnancy_number would be 1, where it should be missing.
Obs | ID | CYCLE | PREG | Pregnancy_number |
1 | 1 | 1 | 0 | . |
2 | 1 | 2 | 1 | 1 |
3 | 1 | 3 | 1 | 2 |
4 | 2 | 1 | 1 | 1 |
5 | 2 | 2 | 0 | 1 |
6 | 3 | 1 | 0 | . |
7 | 4 | 1 | 1 | 1 |
8 | 5 | 1 | 1 | 1 |
9 | 5 | 2 | 1 | 2 |
10 | 5 | 3 | 0 | 2 |
11 | 5 | 4 | 1 | 3 |
In an attempt to remedy this I added this to your code:
data want;
set have;
by patient_id cycle;
retain Pregnancy_number .;
if first.patient_id then Pregnancy_number = .;
if Pregnancy_status = 1 then Pregnancy_number = sum(Pregnancy_number,1);
if Pregnancy_status = 0 then pregnancy_number = .;
run;
However, this does not work because the count restarts for patient number 5, when it should not for the last observation.
Obs | ID | CYCLE | PREG | Pregnancy_number |
1 | 1 | 1 | 0 | . |
2 | 1 | 2 | 1 | 1 |
3 | 1 | 3 | 1 | 2 |
4 | 2 | 1 | 1 | 1 |
5 | 2 | 2 | 0 | . |
6 | 3 | 1 | 0 | . |
7 | 4 | 1 | 1 | 1 |
8 | 5 | 1 | 1 | 1 |
9 | 5 | 2 | 1 | 2 |
10 | 5 | 3 | 0 | . |
11 | 5 | 4 | 1 | 1 |
Thank you so much for your response. I appreciate it tremendously.
To do that, just separate the counter into a separate variable like this.
proc sort data=have;
by patient_id cycle;
run;
data wanted(drop = Pregnancy_counter);
set have;
by patient_id cycle;
retain Pregnancy_counter .;
if first.Patient_id then Pregnancy_counter = .;
if Pregnancy_status = 1 then do;
Pregnancy_counter = sum(Pregnancy_counter,1);
Pregnancy_number = Pregnancy_counter;
end;
run;
Thank you so much!!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.