Hello. I'm struggling with code to do the following:
data have:
visit_id study_id delivery_visit pregnancynum_rank gestational_diabetes
1 1 1 1 1
2 1 0 1 1
3 1 1 2 1
4 2 0 5 1
5 2 1 5 1
6 3 0 2 1
7 3 1 2 1
8 3 1 3 1
9 4 0 3 1
10 4 1 3 1
11 4 0 6 1
12 4 1 6 1
data want:
visit_id study_id delivery_visit pregnancynum_rank gestational_diabetes
1 1 1 1 1
2 1 0 1 1
3 1 1 2 1
4 2 0 5 1
5 2 1 5 1
6 3 0 2 1
7 3 1 2 1
8 3 1 3 1
9 4 0 3 1
10 4 1 3 1
11 4 0 6 1
12 4 1 6 1
For moms with gestational diabetes, I need to output the first pregnancy in the dataset for which they had gest. diabetes. For example, study_id had 2 pregnancies with gest. diabetes but I want to output all observations associated with her pregnancynum=1. Mother study_id 4 has 2 pregnanices with gestatational diabetes (pregnancynum 3 and 6), but I only want to keep all obervations associated with pregnancynum=3 (because its her first preg in the dataset with GDM).
I've been trying some PROC SQL codes and can't make it work. Suggestions? Thank you!
Slice it twice - first get the ID and Preganancy Rank you need and then either use a data step or SQL merge to keep it if you need the first 'group' and not just the first record.
If you've tried it, post your code that you have so far.
I'm a little confused by your example - surely study_id 2 only has 1 pregnancy?
The dataset has moms with between 1-7 pregnancies. A woman with multiple pregnancies does not necessarily have gestatational diabetes during all her pregnancies.
So the dataset I have, contains the pregnancies (pregnancynum 1,2,3,4,5,6,and/or 7) for which the mom had gest. daibetes.
Mom study_id 2, for example, only has 1 pregnancy in the dataset when she had gest. diabetes. Her pregnancy number 1, she did not have the condition. So I need all her observations associated with the pregnancy #2.
Mom study_id 4 has had 6 babies in the dataset, but only had gest. diabetes for pregnancy 3 and 6. I don't want to keep the observations for more than 1 pregnancy with GDM. Therefore, I want her observations associated with her first gest. diabetes pregnancy (or number 3 only).
Does that make sense? I want to output all the data associated with the first gest. diabetes pregnancy in the dataset, and get rid of the 2nd, 3rd.....nth gest.diabetes prenancies.
Try using a DATA step and using the FIRST instead of SQL.
A data step will be much easier than SQL>
I tried that too. But it only output the first observation, not the entire group of observations, for the pregnancy of interest.
Do you have sample code you could provide to show me what you mean ?
thank you
Slice it twice - first get the ID and Preganancy Rank you need and then either use a data step or SQL merge to keep it if you need the first 'group' and not just the first record.
If you've tried it, post your code that you have so far.
Right got it. Get first.study_id then join back to original dataset by study_id and pregnancynum. Thanks!
I followed Reeza's suggestion and did it this way:
data pregnancy_data;
input visit_id study_id delivery_visit pregnancynum_rank gestational_diabetes;
datalines;
1 1 1 1 1
2 1 0 1 1
3 1 1 2 1
4 2 0 5 1
5 2 1 5 1
6 3 0 2 1
7 3 1 2 1
8 3 1 3 1
9 4 0 3 1
10 4 1 3 1
11 4 0 6 1
12 4 1 6 1
;
run;
proc sort data = pregnancy_data;
by study_id pregnancynum_rank;
run;
*grab the first pregnancy with gestational diabetes for each mom in the dataset;
data first_with_gest_diab;
set pregnancy_data;
by study_id pregnancynum_rank;
if first.study_id and first.pregnancynum_rank and gestational_diabetes = 1;
run;
*go back and get all the records associated with those pregnancies;
data first_with_gest_diab_all;
proc sql;
create table first_with_gest_diab_all as
select pregnancy_data.*
from pregnancy_data
inner join first_with_gest_diab on pregnancy_data.study_id = first_with_gest_diab.study_id
and pregnancy_data.pregnancynum_rank = first_with_gest_diab.pregnancynum_rank
order by visit_id, study_id, pregnancynum_rank;
quit;
That gave me
The SAS System visit_id study_id delivery_visit pregnancynum_rank gestational_diabetes 1 1 1 1 1 2 1 0 1 1 4 2 0 5 1 5 2 1 5 1 6 3 0 2 1 7 3 1 2 1 9 4 0 3 1 10 4 1 3 1
Which is the desired result i think.
@HB, thanks! This line "if first.study_id and first.pregnancynum_rank and gestational_diabetes = 1;" is what I was looking for. Didn't realize you could do 2 first. in one if statement. Thanks again!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.