Output dataset by multiple variables or groups

Accepted Solution Solved
Reply
Contributor
Posts: 41
Accepted Solution

Output dataset by multiple variables or groups

 

Hello. I need to output two datasets. Here is what I need to do for each:

 

For dataset 1, if a mother has gestational diabetes in ANY pregnancynum_rank (in pregnancy 1,2...n), then I want to output a dataset of ALL her pregnancy visits (whether or not that pregnancy had gestational diabetes).

 

Dataset 1 -- data have:

visit_id       study_id      delivery_visit   pregnancynum_rank   gestational_diabetes

1                      1                     1                        1                                 0

2                      1                     0                        2                                 1

3                      1                     1                        2                                 1

4                      2                     0                        1                                 0

5                      2                     1                        1                                 0

6                      3                     0                        1                                 0

7                      3                     1                        1                                 1

8                      3                     1                        2                                 1

9                      4                     0                        1                                 0

10                    4                     1                        1                                 0

11                    4                     0                        2                                 1

12                    4                     1                        2                                 1

 

 

data want:

visit_id       study_id      delivery_visit   pregnancynum_rank   gestational_diabetes

1                      1                     1                        1                                 0

2                      1                     0                        2                                 1

3                      1                     1                        2                                 1

4                      2                     0                        1                                 0

5                      2                     1                        1                                 0

6                      3                     0                        1                                 0

7                      3                     1                        1                                 1

8                      3                     1                        2                                 1

9                      4                     0                        1                                 0

10                    4                     1                        1                                 0

11                    4                     0                        2                                 1

12                    4                     1                        2                                 1

 

For the next step, dataset 2, I want to choose the first pregnancy where the mom had gestational diabetes and output all her visits for that pregnancy only.

 

Dataset 2 -- data want:

visit_id       study_id      delivery_visit   pregnancynum_rank   gestational_diabetes

1                      1                     1                        1                                 0

2                      1                     0                        2                                 1

3                      1                     1                        2                                 1

4                      2                     0                        1                                 0

5                      2                     1                        1                                 0

6                      3                     0                        1                                 0

7                      3                     1                        1                                 1

8                      3                     1                        2                                 1

9                      4                     0                        1                                 0

10                    4                     1                        1                                 0

11                    4                     0                        2                                 1

12                    4                     1                        2                                 1

 

Much appreciated if you provide the code.  Thanks for your help!


Accepted Solutions
Solution
‎09-17-2017 04:20 PM
PROC Star
Posts: 826

Re: Output dataset by multiple variables or groups

second one is something you should have tried lol, but anyway:

 

data have;

input visit_id       study_id      delivery_visit   pregnancynum_rank   gestational_diabetes;

datalines;

1                      1                     1                        1                                 0

2                      1                     0                        2                                 1

3                      1                     1                        2                                 1

4                      2                     0                        1                                 0

5                      2                     1                        1                                 0

6                      3                     0                        1                                 0

7                      3                     1                        1                                 1

8                      3                     1                        2                                 1

9                      4                     0                        1                                 0

10                    4                     1                        1                                 0

11                    4                     0                        2                                 1

12                    4                     1                        2                                 1

;

 

proc sql;

create table want1 as

select *

from have

group by study_id

having max(gestational_diabetes)=1

order by visit_id;

quit;

 

proc sql;

create table want2 as

select *

from have

where gestational_diabetes=1;

quit;

 

View solution in original post


All Replies
Solution
‎09-17-2017 04:20 PM
PROC Star
Posts: 826

Re: Output dataset by multiple variables or groups

second one is something you should have tried lol, but anyway:

 

data have;

input visit_id       study_id      delivery_visit   pregnancynum_rank   gestational_diabetes;

datalines;

1                      1                     1                        1                                 0

2                      1                     0                        2                                 1

3                      1                     1                        2                                 1

4                      2                     0                        1                                 0

5                      2                     1                        1                                 0

6                      3                     0                        1                                 0

7                      3                     1                        1                                 1

8                      3                     1                        2                                 1

9                      4                     0                        1                                 0

10                    4                     1                        1                                 0

11                    4                     0                        2                                 1

12                    4                     1                        2                                 1

;

 

proc sql;

create table want1 as

select *

from have

group by study_id

having max(gestational_diabetes)=1

order by visit_id;

quit;

 

proc sql;

create table want2 as

select *

from have

where gestational_diabetes=1;

quit;

 

Contributor
Posts: 41

Re: Output dataset by multiple variables or groups

Posted in reply to novinosrin

Lol, you're so right, but thanks for the reply. Can I implore you to take a look at the following thread (I'm stuck on the previous step to solve the problem you just answered):

 

1.). https://communities.sas.com/t5/Base-SAS-Programming/Don-t-know-term-or-code/m-p/396658#M95800 (directed as previous person who replied, ignore that)

 

 

Thanks!

 

PROC Star
Posts: 826

Re: Output dataset by multiple variables or groups

Sure, I had a look. How may I help you?

Contributor
Posts: 41

Re: Output dataset by multiple variables or groups

Posted in reply to novinosrin

Thank you. 

 

For any mothers that have multiple_gestation=1 for any of the visits within that pregancy number rank, then multiple_gestation=1 for all visits within the pregnancy number rank.

 

For example, see data below. Mother study_id=1 did not have gest_diab during her first pregnancy, but had gest_diabetes during her second preg. Need to get the gest_diabetes =1 for visit_id 2 & 3 for that mom (but not for 1st preg (visit_id 1)). Same with mom study_id 4. Her 1st pregnancy she did not have gest_diabetes, but had gest_diabetes during her second preg. Visit_id 11 & 12 for mom study_id 4 needs to equal 1, but her visit_id 9 and 10 should stay 0. 

The mothers with multiple_gestation=0 for all pregnancies would be 0 for all visits within all pregnancy ranks.

 

Data have:

visit_id       study_id      delivery_visit   pregnancynum_rank   gestational_diabetes

1                      1                     1                        1                                 0

2                      1                     0                        2                                 1

3                      1                     1                        2                                 0

4                      2                     0                        1                                 0

5                      2                     1                        1                                 0

6                      3                     0                        1                                 0

7                      3                     1                        1                                 1

8                      3                     1                        2                                 1

9                      4                     0                        1                                 0

10                    4                     1                        1                                 0

11                    4                     0                        2                                 1

12                    4                     1                        2                                 0

 

 

 

Data want:

visit_id       study_id      delivery_visit   pregnancynum_rank   gestational_diabetes

1                      1                     1                        1                                 0

2                      1                     0                        2                                 1

3                      1                     1                        2                                 1

4                      2                     0                        1                                 0

5                      2                     1                        1                                 0

6                      3                     0                        1                                 0

7                      3                     1                        1                                 1

8                      3                     1                        2                                 1

9                      4                     0                        1                                 0

10                    4                     1                        1                                 0

11                    4                     0                        2                                 1

12                    4                     1                        2                                 1

 

Hope that makes sense. Thanks  for your help!

PROC Star
Posts: 826

Re: Output dataset by multiple variables or groups

Data have;
input visit_id study_id delivery_visit pregnancynum_rank gestational_diabetes;
datalines;
1 1 1 1 0
2 1 0 2 1
3 1 1 2 0
4 2 0 1 0
5 2 1 1 0
6 3 0 1 0
7 3 1 1 1
8 3 1 2 1
9 4 0 1 0
10 4 1 1 0
11 4 0 2 1
12 4 1 2 0
;

proc sql;
create table want(drop=gestational_diabetes) as
select *,max(gestational_diabetes) as _gestational_diabetes
from have
group by study_id,pregnancynum_rank;
quit;

Contributor
Posts: 41

Re: Output dataset by multiple variables or groups

Posted in reply to novinosrin

Okay, I understand the PROC SQL statment. But what is the first part (below)?
Can I apply the PROC SQL to the data I have? Or do I need to use the following code somehow? 

 

Data have;
input visit_id study_id delivery_visit pregnancynum_rank gestational_diabetes;
datalines;
1 1 1 1 0
2 1 0 2 1
3 1 1 2 0
4 2 0 1 0
5 2 1 1 0
6 3 0 1 0
7 3 1 1 1
8 3 1 2 1
9 4 0 1 0
10 4 1 1 0
11 4 0 2 1
12 4 1 2 0
;

PROC Star
Posts: 826

Re: Output dataset by multiple variables or groups

LoL the data have is merely the sample that you gave me as a input and I trust your sample is representative of the data you have. If it is the code should comfortably work. I really hope every OP(Original poster)'s sample is a good representative of their real data so that we the community(our) solutions work for them. 

Contributor
Posts: 41

Re: Output dataset by multiple variables or groups

So glad I could make you laugh todaySmiley Wink Thanks for all your help!
PROC Star
Posts: 826

Re: Output dataset by multiple variables or groups

@eabc0351 You're welcome. My apologies for the laugh, but I didn't mean to. Have a good one!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 265 views
  • 1 like
  • 2 in conversation