Quartz | Level 8

## Output all observations within a certain group

Hello. I'm struggling with code to do the following:

data have:

visit_id       study_id      delivery_visit   pregnancynum_rank   gestational_diabetes

1                      1                     1                        1                                 1

2                      1                     0                        1                                 1

3                      1                     1                        2                                 1

4                      2                     0                        5                                 1

5                      2                     1                        5                                 1

6                      3                     0                        2                                 1

7                      3                     1                        2                                 1

8                      3                     1                        3                                 1

9                      4                     0                        3                                  1

10                    4                     1                        3                                 1

11                    4                     0                        6                                  1

12                    4                     1                        6                                 1

data want:

visit_id       study_id      delivery_visit   pregnancynum_rank   gestational_diabetes

1                      1                     1                        1                                 1

2                      1                     0                        1                                 1

3                      1                     1                        2                                 1

4                      2                     0                        5                                 1

5                      2                     1                        5                                 1

6                      3                     0                        2                                 1

7                      3                     1                        2                                 1

8                      3                     1                        3                                 1

9                      4                     0                        3                                  1

10                    4                     1                        3                                 1

11                    4                     0                        6                                  1

12                    4                     1                        6                                 1

For moms with gestational diabetes, I need to output the first pregnancy in the dataset for which they had gest. diabetes. For example, study_id had 2 pregnancies with gest. diabetes but I want to output all observations associated with her pregnancynum=1. Mother study_id 4 has 2 pregnanices with gestatational diabetes (pregnancynum 3 and 6), but I only want to keep all obervations associated with pregnancynum=3 (because its her first preg in the dataset with GDM).

I've been trying some PROC SQL codes and can't make it work. Suggestions? Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Output all observations within a certain group

Slice it twice - first get the ID and Preganancy Rank you need and then either use a data step or SQL merge to keep it if you need the first 'group' and not just the first record.

If you've tried it, post your code that you have so far.

8 REPLIES 8
Ammonite | Level 13

## Re: Output all observations within a certain group

I'm a little confused by your example - surely study_id 2 only has 1 pregnancy?

Quartz | Level 8

## Re: Output all observations within a certain group

The dataset has moms with between 1-7 pregnancies. A woman with multiple pregnancies does not necessarily have gestatational diabetes during all her pregnancies.

So the dataset I have, contains the pregnancies (pregnancynum 1,2,3,4,5,6,and/or 7) for which the mom had gest. daibetes.

Mom study_id 2, for example, only has 1 pregnancy in the dataset when she had gest. diabetes. Her pregnancy number 1, she did not have the condition. So I need all her observations associated with the pregnancy #2.

Mom study_id 4 has had 6 babies in the dataset, but only had gest. diabetes for pregnancy 3 and 6. I don't want to keep the observations for more than 1 pregnancy with GDM. Therefore, I want her observations associated with her first gest. diabetes pregnancy (or number 3 only).

Does that make sense? I want to output all the data associated with the first gest. diabetes pregnancy in the dataset, and get rid of the 2nd, 3rd.....nth gest.diabetes prenancies.

Super User

## Re: Output all observations within a certain group

Try using a DATA step and using the FIRST instead of SQL.

A data step will be much easier than SQL>

Quartz | Level 8

## Re: Output all observations within a certain group

I tried that too.   But it only output the first observation, not the entire group of observations, for the pregnancy of interest.

Do you have sample code you could provide to show me what you mean ?

thank you

Super User

## Re: Output all observations within a certain group

Slice it twice - first get the ID and Preganancy Rank you need and then either use a data step or SQL merge to keep it if you need the first 'group' and not just the first record.

If you've tried it, post your code that you have so far.

Quartz | Level 8

## Re: Output all observations within a certain group

Right got it. Get first.study_id then join back to original dataset by study_id and pregnancynum. Thanks!

Barite | Level 11

## Re: Output all observations within a certain group

I followed Reeza's suggestion and did it this way:

``````data pregnancy_data;
input visit_id study_id delivery_visit pregnancynum_rank gestational_diabetes;
datalines;
1 1 1 1 1
2 1 0 1 1
3 1 1 2 1
4 2 0 5 1
5 2 1 5 1
6 3 0 2 1
7 3 1 2 1
8 3 1 3 1
9 4 0 3 1
10 4 1 3 1
11 4 0 6 1
12 4 1 6 1
;
run;

proc sort data = pregnancy_data;
by study_id pregnancynum_rank;
run;

*grab the first pregnancy with gestational diabetes for each mom in the dataset;
data first_with_gest_diab;
set pregnancy_data;
by study_id pregnancynum_rank;
if first.study_id and first.pregnancynum_rank and gestational_diabetes = 1;
run;

*go back and get all the records associated with those pregnancies;
data first_with_gest_diab_all;
proc sql;
create table first_with_gest_diab_all as
select pregnancy_data.*
from pregnancy_data
inner join first_with_gest_diab on pregnancy_data.study_id = first_with_gest_diab.study_id
and pregnancy_data.pregnancynum_rank = first_with_gest_diab.pregnancynum_rank
order by visit_id, study_id, pregnancynum_rank;
quit;

``````

That gave me

```The SAS System

visit_id	study_id	delivery_visit	pregnancynum_rank	gestational_diabetes
1	1	1	1	1
2	1	0	1	1
4	2	0	5	1
5	2	1	5	1
6	3	0	2	1
7	3	1	2	1
9	4	0	3	1
10	4	1	3	1
```

Which is the desired result i think.

Quartz | Level 8

## Re: Output all observations within a certain group

@HB, thanks! This line "if first.study_id and first.pregnancynum_rank and gestational_diabetes = 1;"  is what I was looking for. Didn't realize you could do 2 first. in one if statement. Thanks again!

Discussion stats
• 8 replies
• 880 views
• 3 likes
• 4 in conversation