BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eabc0351
Quartz | Level 8

Hello. I'm struggling with code to do the following:

 

data have:

visit_id       study_id      delivery_visit   pregnancynum_rank   gestational_diabetes

1                      1                     1                        1                                 1

2                      1                     0                        1                                 1

3                      1                     1                        2                                 1

4                      2                     0                        5                                 1

5                      2                     1                        5                                 1

6                      3                     0                        2                                 1

7                      3                     1                        2                                 1

8                      3                     1                        3                                 1

9                      4                     0                        3                                  1

10                    4                     1                        3                                 1

11                    4                     0                        6                                  1

12                    4                     1                        6                                 1

 

 

data want:

visit_id       study_id      delivery_visit   pregnancynum_rank   gestational_diabetes

1                      1                     1                        1                                 1

2                      1                     0                        1                                 1

3                      1                     1                        2                                 1

4                      2                     0                        5                                 1

5                      2                     1                        5                                 1

6                      3                     0                        2                                 1

7                      3                     1                        2                                 1

8                      3                     1                        3                                 1

9                      4                     0                        3                                  1

10                    4                     1                        3                                 1

11                    4                     0                        6                                  1

12                    4                     1                        6                                 1

 

For moms with gestational diabetes, I need to output the first pregnancy in the dataset for which they had gest. diabetes. For example, study_id had 2 pregnancies with gest. diabetes but I want to output all observations associated with her pregnancynum=1. Mother study_id 4 has 2 pregnanices with gestatational diabetes (pregnancynum 3 and 6), but I only want to keep all obervations associated with pregnancynum=3 (because its her first preg in the dataset with GDM).

 

I've been trying some PROC SQL codes and can't make it work. Suggestions? Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Slice it twice - first get the ID and Preganancy Rank you need and then either use a data step or SQL merge to keep it if you need the first 'group' and not just the first record.

 

If you've tried it, post your code that you have so far.

 

View solution in original post

8 REPLIES 8
ChrisBrooks
Ammonite | Level 13

I'm a little confused by your example - surely study_id 2 only has 1 pregnancy?

eabc0351
Quartz | Level 8

The dataset has moms with between 1-7 pregnancies. A woman with multiple pregnancies does not necessarily have gestatational diabetes during all her pregnancies.


So the dataset I have, contains the pregnancies (pregnancynum 1,2,3,4,5,6,and/or 7) for which the mom had gest. daibetes.

Mom study_id 2, for example, only has 1 pregnancy in the dataset when she had gest. diabetes. Her pregnancy number 1, she did not have the condition. So I need all her observations associated with the pregnancy #2.

Mom study_id 4 has had 6 babies in the dataset, but only had gest. diabetes for pregnancy 3 and 6. I don't want to keep the observations for more than 1 pregnancy with GDM. Therefore, I want her observations associated with her first gest. diabetes pregnancy (or number 3 only).

Does that make sense? I want to output all the data associated with the first gest. diabetes pregnancy in the dataset, and get rid of the 2nd, 3rd.....nth gest.diabetes prenancies.

 

Reeza
Super User

Try using a DATA step and using the FIRST instead of SQL.

 

A data step will be much easier than SQL>

eabc0351
Quartz | Level 8

I tried that too.   But it only output the first observation, not the entire group of observations, for the pregnancy of interest. 

 Do you have sample code you could provide to show me what you mean ?

thank you

Reeza
Super User

Slice it twice - first get the ID and Preganancy Rank you need and then either use a data step or SQL merge to keep it if you need the first 'group' and not just the first record.

 

If you've tried it, post your code that you have so far.

 

eabc0351
Quartz | Level 8

Right got it. Get first.study_id then join back to original dataset by study_id and pregnancynum. Thanks!

HB
Barite | Level 11 HB
Barite | Level 11

I followed Reeza's suggestion and did it this way:

 

data pregnancy_data;
   input visit_id study_id delivery_visit pregnancynum_rank gestational_diabetes;
datalines;
1 1 1 1 1
2 1 0 1 1
3 1 1 2 1
4 2 0 5 1
5 2 1 5 1
6 3 0 2 1
7 3 1 2 1
8 3 1 3 1
9 4 0 3 1
10 4 1 3 1
11 4 0 6 1
12 4 1 6 1
;
run;

proc sort data = pregnancy_data;
	by study_id pregnancynum_rank;
run;

*grab the first pregnancy with gestational diabetes for each mom in the dataset;
data first_with_gest_diab;
	set pregnancy_data;
	by study_id pregnancynum_rank;
	if first.study_id and first.pregnancynum_rank and gestational_diabetes = 1;
run;

*go back and get all the records associated with those pregnancies;
data first_with_gest_diab_all;
proc sql;
	create table first_with_gest_diab_all as
	select pregnancy_data.* 
	from pregnancy_data 
	inner join first_with_gest_diab on pregnancy_data.study_id = first_with_gest_diab.study_id
	and pregnancy_data.pregnancynum_rank = first_with_gest_diab.pregnancynum_rank 
	order by visit_id, study_id, pregnancynum_rank;
quit;

That gave me

The SAS System				
				
visit_id	study_id	delivery_visit	pregnancynum_rank	gestational_diabetes
1	1	1	1	1
2	1	0	1	1
4	2	0	5	1
5	2	1	5	1
6	3	0	2	1
7	3	1	2	1
9	4	0	3	1
10	4	1	3	1

Which is the desired result i think.

eabc0351
Quartz | Level 8

@HB, thanks! This line "if first.study_id and first.pregnancynum_rank and gestational_diabetes = 1;"  is what I was looking for. Didn't realize you could do 2 first. in one if statement. Thanks again!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2365 views
  • 3 likes
  • 4 in conversation