BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MBlack732
Fluorite | Level 6

Hello!

For an assignment, we merged two data sets "bp" (which include subject, date of visit, SBP and DBP; there were multiple observations per subject) and "demog" (which include variables like date of birth, date of entering the study, marital status, education, smoking, and drinking). I need to make sure that the subjects are all under the age of 55 and the blood pressure readings considered are only from the first visit date. So far in my code I did:

 

LIBNAME BIST0535 "C:\Users\Paula\Desktop\BIST0535";

 

*Narrowing down blood pressure data to first visit only*

DATA BIST0535.bp_hw3_1;

                SET BIST0535.bp_hw3;

RUN;

 

PROC SORT Data=BIST0535.bp_hw3_1;

                BY subject;

RUN;

 

DATA BIST0535.bp_hw3_2;

                SET BIST0535.bp_hw3_1;

                BY subject;

                IF FIRST.subject;

RUN;

 

*Merging of Datasets and narrowing subject list to people younger than 55*

DATA BIST0535.HW3;

                SET BIST0535.demog_hw3 BIST0535.bp_hw3_2;

                age = (doe-dob)/365.25;

                IF age GT 55 THEN DELETE;

RUN;

 

*Looking at relationship between marital status and smoking/drinking*

PROC SORT DATA= BIST0535.HW3;

                BY marital;

RUN;

 

PROC FREQ DATA= BIST0535.HW3;

                TABLES smoke drink smoke*drink;

                BY marital;

                TITLE "HW3 Frequencies of smoke, drink, and smoke*drink by marital status";

RUN;

 

Now I want to look at the relationship between educational attainment (education) and blood pressure readings (SBP and DBP), by looking at the break down of frequencies, as I did above with marital and smoking/drinking. However it is not working and I believe it is because the way the data was merged. Even if the subject is the same, the variables from the two different original data sets are appearing on separate observations. Here is a section of the proc print (I italicized subject 1, as it exemplifies this issue):

 

Obs subject site gender dob doe smoke drink marital education dov dbp sbp age
1. ......14145107134.
15F-1092140690233...41.5086
2. ......14447115143.
3. ......14082103148.
4. ......1463287110.
41F892146320523...37.6181
5. ......1465066108.
52F259145580212...39.1485
6. ......1452075133.
64F1809144291233...34.5517
7. ......1481387131.
73M-3979146370232...50.9678
8. ......14285112149.
85M-3669142850423...49.1554
9. ......1486584123.
93F2438146180333...33.3470
10. ......1396788120

 

HOW DO I FIX THIS PLEASE?

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Always add comments so it's clear what you're trying to do in each step.

LIBNAME BIST0535 "C:\Users\Paula\Desktop\BIST0535";

 

*Narrowing down blood pressure data to first visit only*

DATA BIST0535.bp_hw3_1;

                SET BIST0535.bp_hw3;

RUN;

 

PROC SORT Data=BIST0535.bp_hw3_1;

                BY subject;

RUN;

 

DATA BIST0535.bp_hw3_2;

                SET BIST0535.bp_hw3_1;

                BY subject;
                 *this takes the first subject, but you likely want a sort by date to pick the first date. Since you didn't sort by date, it could be any of the dates not necessarily the first record. Visually verify the data to ensure you get the data correct;
                IF FIRST.subject;

RUN;

 

*Merging of Datasets and narrowing subject list to people younger than 55*

DATA BIST0535.HW3;
                *SET APPENDS, it does not merge. You need to change SET to MERGE;
                SET BIST0535.demog_hw3 BIST0535.bp_hw3_2;

                *No BY statement to merge by ID  - add one;

                age = (doe-dob)/365.25; * Use YRDIF() instead to calculate the age;

                IF age GT 55 THEN DELETE;

RUN;

 

 

Once you have those issues fixed, you should be closer to what you need.

 


@MBlack732 wrote:

Hello!

For an assignment, we merged two data sets "bp" (which include subject, date of visit, SBP and DBP; there were multiple observations per subject) and "demog" (which include variables like date of birth, date of entering the study, marital status, education, smoking, and drinking). I need to make sure that the subjects are all under the age of 55 and the blood pressure readings considered are only from the first visit date. So far in my code I did:

 

LIBNAME BIST0535 "C:\Users\Paula\Desktop\BIST0535";

 

*Narrowing down blood pressure data to first visit only*

DATA BIST0535.bp_hw3_1;

                SET BIST0535.bp_hw3;

RUN;

 

PROC SORT Data=BIST0535.bp_hw3_1;

                BY subject;

RUN;

 

DATA BIST0535.bp_hw3_2;

                SET BIST0535.bp_hw3_1;

                BY subject;

                IF FIRST.subject;

RUN;

 

*Merging of Datasets and narrowing subject list to people younger than 55*

DATA BIST0535.HW3;

                SET BIST0535.demog_hw3 BIST0535.bp_hw3_2;

                age = (doe-dob)/365.25;

                IF age GT 55 THEN DELETE;

RUN;

 

*Looking at relationship between marital status and smoking/drinking*

PROC SORT DATA= BIST0535.HW3;

                BY marital;

RUN;

 

PROC FREQ DATA= BIST0535.HW3;

                TABLES smoke drink smoke*drink;

                BY marital;

                TITLE "HW3 Frequencies of smoke, drink, and smoke*drink by marital status";

RUN;

 

Now I want to look at the relationship between educational attainment (education) and blood pressure readings (SBP and DBP), by looking at the break down of frequencies, as I did above with marital and smoking/drinking. However it is not working and I believe it is because the way the data was merged. Even if the subject is the same, the variables from the two different original data sets are appearing on separate observations. Here is a section of the proc print (I italicized subject 1, as it exemplifies this issue):

 

Obs subject site gender dob doe smoke drink marital education dov dbp sbp age
1 .   . . . . . . 14145 107 134 .
1 5 F -1092 14069 0 2 3 3 . . . 41.5086
2 .   . . . . . . 14447 115 143 .
3 .   . . . . . . 14082 103 148 .
4 .   . . . . . . 14632 87 110 .
4 1 F 892 14632 0 5 2 3 . . . 37.6181
5 .   . . . . . . 14650 66 108 .
5 2 F 259 14558 0 2 1 2 . . . 39.1485
6 .   . . . . . . 14520 75 133 .
6 4 F 1809 14429 1 2 3 3 . . . 34.5517
7 .   . . . . . . 14813 87 131 .
7 3 M -3979 14637 0 2 3 2 . . . 50.9678
8 .   . . . . . . 14285 112 149 .
8 5 M -3669 14285 0 4 2 3 . . . 49.1554
9 .   . . . . . . 14865 84 123 .
9 3 F 2438 14618 0 3 3 3 . . . 33.3470
10 .   . . . . . . 13967 88 120

 

HOW DO I FIX THIS PLEASE?


 

View solution in original post

2 REPLIES 2
Reeza
Super User

Always add comments so it's clear what you're trying to do in each step.

LIBNAME BIST0535 "C:\Users\Paula\Desktop\BIST0535";

 

*Narrowing down blood pressure data to first visit only*

DATA BIST0535.bp_hw3_1;

                SET BIST0535.bp_hw3;

RUN;

 

PROC SORT Data=BIST0535.bp_hw3_1;

                BY subject;

RUN;

 

DATA BIST0535.bp_hw3_2;

                SET BIST0535.bp_hw3_1;

                BY subject;
                 *this takes the first subject, but you likely want a sort by date to pick the first date. Since you didn't sort by date, it could be any of the dates not necessarily the first record. Visually verify the data to ensure you get the data correct;
                IF FIRST.subject;

RUN;

 

*Merging of Datasets and narrowing subject list to people younger than 55*

DATA BIST0535.HW3;
                *SET APPENDS, it does not merge. You need to change SET to MERGE;
                SET BIST0535.demog_hw3 BIST0535.bp_hw3_2;

                *No BY statement to merge by ID  - add one;

                age = (doe-dob)/365.25; * Use YRDIF() instead to calculate the age;

                IF age GT 55 THEN DELETE;

RUN;

 

 

Once you have those issues fixed, you should be closer to what you need.

 


@MBlack732 wrote:

Hello!

For an assignment, we merged two data sets "bp" (which include subject, date of visit, SBP and DBP; there were multiple observations per subject) and "demog" (which include variables like date of birth, date of entering the study, marital status, education, smoking, and drinking). I need to make sure that the subjects are all under the age of 55 and the blood pressure readings considered are only from the first visit date. So far in my code I did:

 

LIBNAME BIST0535 "C:\Users\Paula\Desktop\BIST0535";

 

*Narrowing down blood pressure data to first visit only*

DATA BIST0535.bp_hw3_1;

                SET BIST0535.bp_hw3;

RUN;

 

PROC SORT Data=BIST0535.bp_hw3_1;

                BY subject;

RUN;

 

DATA BIST0535.bp_hw3_2;

                SET BIST0535.bp_hw3_1;

                BY subject;

                IF FIRST.subject;

RUN;

 

*Merging of Datasets and narrowing subject list to people younger than 55*

DATA BIST0535.HW3;

                SET BIST0535.demog_hw3 BIST0535.bp_hw3_2;

                age = (doe-dob)/365.25;

                IF age GT 55 THEN DELETE;

RUN;

 

*Looking at relationship between marital status and smoking/drinking*

PROC SORT DATA= BIST0535.HW3;

                BY marital;

RUN;

 

PROC FREQ DATA= BIST0535.HW3;

                TABLES smoke drink smoke*drink;

                BY marital;

                TITLE "HW3 Frequencies of smoke, drink, and smoke*drink by marital status";

RUN;

 

Now I want to look at the relationship between educational attainment (education) and blood pressure readings (SBP and DBP), by looking at the break down of frequencies, as I did above with marital and smoking/drinking. However it is not working and I believe it is because the way the data was merged. Even if the subject is the same, the variables from the two different original data sets are appearing on separate observations. Here is a section of the proc print (I italicized subject 1, as it exemplifies this issue):

 

Obs subject site gender dob doe smoke drink marital education dov dbp sbp age
1 .   . . . . . . 14145 107 134 .
1 5 F -1092 14069 0 2 3 3 . . . 41.5086
2 .   . . . . . . 14447 115 143 .
3 .   . . . . . . 14082 103 148 .
4 .   . . . . . . 14632 87 110 .
4 1 F 892 14632 0 5 2 3 . . . 37.6181
5 .   . . . . . . 14650 66 108 .
5 2 F 259 14558 0 2 1 2 . . . 39.1485
6 .   . . . . . . 14520 75 133 .
6 4 F 1809 14429 1 2 3 3 . . . 34.5517
7 .   . . . . . . 14813 87 131 .
7 3 M -3979 14637 0 2 3 2 . . . 50.9678
8 .   . . . . . . 14285 112 149 .
8 5 M -3669 14285 0 4 2 3 . . . 49.1554
9 .   . . . . . . 14865 84 123 .
9 3 F 2438 14618 0 3 3 3 . . . 33.3470
10 .   . . . . . . 13967 88 120

 

HOW DO I FIX THIS PLEASE?


 

MBlack732
Fluorite | Level 6

Perfect! Thank you so much! Such a simple fix, I feel foolish that I missed it! Thanks again for the help!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 680 views
  • 2 likes
  • 2 in conversation