BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MBlack732
Fluorite | Level 6

Hello!

For an assignment, we merged two data sets "bp" (which include subject, date of visit, SBP and DBP; there were multiple observations per subject) and "demog" (which include variables like date of birth, date of entering the study, marital status, education, smoking, and drinking). I need to make sure that the subjects are all under the age of 55 and the blood pressure readings considered are only from the first visit date. So far in my code I did:

 

LIBNAME BIST0535 "C:\Users\Paula\Desktop\BIST0535";

 

*Narrowing down blood pressure data to first visit only*

DATA BIST0535.bp_hw3_1;

                SET BIST0535.bp_hw3;

RUN;

 

PROC SORT Data=BIST0535.bp_hw3_1;

                BY subject;

RUN;

 

DATA BIST0535.bp_hw3_2;

                SET BIST0535.bp_hw3_1;

                BY subject;

                IF FIRST.subject;

RUN;

 

*Merging of Datasets and narrowing subject list to people younger than 55*

DATA BIST0535.HW3;

                SET BIST0535.demog_hw3 BIST0535.bp_hw3_2;

                age = (doe-dob)/365.25;

                IF age GT 55 THEN DELETE;

RUN;

 

*Looking at relationship between marital status and smoking/drinking*

PROC SORT DATA= BIST0535.HW3;

                BY marital;

RUN;

 

PROC FREQ DATA= BIST0535.HW3;

                TABLES smoke drink smoke*drink;

                BY marital;

                TITLE "HW3 Frequencies of smoke, drink, and smoke*drink by marital status";

RUN;

 

Now I want to look at the relationship between educational attainment (education) and blood pressure readings (SBP and DBP), by looking at the break down of frequencies, as I did above with marital and smoking/drinking. However it is not working and I believe it is because the way the data was merged. Even if the subject is the same, the variables from the two different original data sets are appearing on separate observations. Here is a section of the proc print (I italicized subject 1, as it exemplifies this issue):

 

Obs subject site gender dob doe smoke drink marital education dov dbp sbp age
1. ......14145107134.
15F-1092140690233...41.5086
2. ......14447115143.
3. ......14082103148.
4. ......1463287110.
41F892146320523...37.6181
5. ......1465066108.
52F259145580212...39.1485
6. ......1452075133.
64F1809144291233...34.5517
7. ......1481387131.
73M-3979146370232...50.9678
8. ......14285112149.
85M-3669142850423...49.1554
9. ......1486584123.
93F2438146180333...33.3470
10. ......1396788120

 

HOW DO I FIX THIS PLEASE?

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Always add comments so it's clear what you're trying to do in each step.

LIBNAME BIST0535 "C:\Users\Paula\Desktop\BIST0535";

 

*Narrowing down blood pressure data to first visit only*

DATA BIST0535.bp_hw3_1;

                SET BIST0535.bp_hw3;

RUN;

 

PROC SORT Data=BIST0535.bp_hw3_1;

                BY subject;

RUN;

 

DATA BIST0535.bp_hw3_2;

                SET BIST0535.bp_hw3_1;

                BY subject;
                 *this takes the first subject, but you likely want a sort by date to pick the first date. Since you didn't sort by date, it could be any of the dates not necessarily the first record. Visually verify the data to ensure you get the data correct;
                IF FIRST.subject;

RUN;

 

*Merging of Datasets and narrowing subject list to people younger than 55*

DATA BIST0535.HW3;
                *SET APPENDS, it does not merge. You need to change SET to MERGE;
                SET BIST0535.demog_hw3 BIST0535.bp_hw3_2;

                *No BY statement to merge by ID  - add one;

                age = (doe-dob)/365.25; * Use YRDIF() instead to calculate the age;

                IF age GT 55 THEN DELETE;

RUN;

 

 

Once you have those issues fixed, you should be closer to what you need.

 


@MBlack732 wrote:

Hello!

For an assignment, we merged two data sets "bp" (which include subject, date of visit, SBP and DBP; there were multiple observations per subject) and "demog" (which include variables like date of birth, date of entering the study, marital status, education, smoking, and drinking). I need to make sure that the subjects are all under the age of 55 and the blood pressure readings considered are only from the first visit date. So far in my code I did:

 

LIBNAME BIST0535 "C:\Users\Paula\Desktop\BIST0535";

 

*Narrowing down blood pressure data to first visit only*

DATA BIST0535.bp_hw3_1;

                SET BIST0535.bp_hw3;

RUN;

 

PROC SORT Data=BIST0535.bp_hw3_1;

                BY subject;

RUN;

 

DATA BIST0535.bp_hw3_2;

                SET BIST0535.bp_hw3_1;

                BY subject;

                IF FIRST.subject;

RUN;

 

*Merging of Datasets and narrowing subject list to people younger than 55*

DATA BIST0535.HW3;

                SET BIST0535.demog_hw3 BIST0535.bp_hw3_2;

                age = (doe-dob)/365.25;

                IF age GT 55 THEN DELETE;

RUN;

 

*Looking at relationship between marital status and smoking/drinking*

PROC SORT DATA= BIST0535.HW3;

                BY marital;

RUN;

 

PROC FREQ DATA= BIST0535.HW3;

                TABLES smoke drink smoke*drink;

                BY marital;

                TITLE "HW3 Frequencies of smoke, drink, and smoke*drink by marital status";

RUN;

 

Now I want to look at the relationship between educational attainment (education) and blood pressure readings (SBP and DBP), by looking at the break down of frequencies, as I did above with marital and smoking/drinking. However it is not working and I believe it is because the way the data was merged. Even if the subject is the same, the variables from the two different original data sets are appearing on separate observations. Here is a section of the proc print (I italicized subject 1, as it exemplifies this issue):

 

Obs subject site gender dob doe smoke drink marital education dov dbp sbp age
1 .   . . . . . . 14145 107 134 .
1 5 F -1092 14069 0 2 3 3 . . . 41.5086
2 .   . . . . . . 14447 115 143 .
3 .   . . . . . . 14082 103 148 .
4 .   . . . . . . 14632 87 110 .
4 1 F 892 14632 0 5 2 3 . . . 37.6181
5 .   . . . . . . 14650 66 108 .
5 2 F 259 14558 0 2 1 2 . . . 39.1485
6 .   . . . . . . 14520 75 133 .
6 4 F 1809 14429 1 2 3 3 . . . 34.5517
7 .   . . . . . . 14813 87 131 .
7 3 M -3979 14637 0 2 3 2 . . . 50.9678
8 .   . . . . . . 14285 112 149 .
8 5 M -3669 14285 0 4 2 3 . . . 49.1554
9 .   . . . . . . 14865 84 123 .
9 3 F 2438 14618 0 3 3 3 . . . 33.3470
10 .   . . . . . . 13967 88 120

 

HOW DO I FIX THIS PLEASE?


 

View solution in original post

2 REPLIES 2
Reeza
Super User

Always add comments so it's clear what you're trying to do in each step.

LIBNAME BIST0535 "C:\Users\Paula\Desktop\BIST0535";

 

*Narrowing down blood pressure data to first visit only*

DATA BIST0535.bp_hw3_1;

                SET BIST0535.bp_hw3;

RUN;

 

PROC SORT Data=BIST0535.bp_hw3_1;

                BY subject;

RUN;

 

DATA BIST0535.bp_hw3_2;

                SET BIST0535.bp_hw3_1;

                BY subject;
                 *this takes the first subject, but you likely want a sort by date to pick the first date. Since you didn't sort by date, it could be any of the dates not necessarily the first record. Visually verify the data to ensure you get the data correct;
                IF FIRST.subject;

RUN;

 

*Merging of Datasets and narrowing subject list to people younger than 55*

DATA BIST0535.HW3;
                *SET APPENDS, it does not merge. You need to change SET to MERGE;
                SET BIST0535.demog_hw3 BIST0535.bp_hw3_2;

                *No BY statement to merge by ID  - add one;

                age = (doe-dob)/365.25; * Use YRDIF() instead to calculate the age;

                IF age GT 55 THEN DELETE;

RUN;

 

 

Once you have those issues fixed, you should be closer to what you need.

 


@MBlack732 wrote:

Hello!

For an assignment, we merged two data sets "bp" (which include subject, date of visit, SBP and DBP; there were multiple observations per subject) and "demog" (which include variables like date of birth, date of entering the study, marital status, education, smoking, and drinking). I need to make sure that the subjects are all under the age of 55 and the blood pressure readings considered are only from the first visit date. So far in my code I did:

 

LIBNAME BIST0535 "C:\Users\Paula\Desktop\BIST0535";

 

*Narrowing down blood pressure data to first visit only*

DATA BIST0535.bp_hw3_1;

                SET BIST0535.bp_hw3;

RUN;

 

PROC SORT Data=BIST0535.bp_hw3_1;

                BY subject;

RUN;

 

DATA BIST0535.bp_hw3_2;

                SET BIST0535.bp_hw3_1;

                BY subject;

                IF FIRST.subject;

RUN;

 

*Merging of Datasets and narrowing subject list to people younger than 55*

DATA BIST0535.HW3;

                SET BIST0535.demog_hw3 BIST0535.bp_hw3_2;

                age = (doe-dob)/365.25;

                IF age GT 55 THEN DELETE;

RUN;

 

*Looking at relationship between marital status and smoking/drinking*

PROC SORT DATA= BIST0535.HW3;

                BY marital;

RUN;

 

PROC FREQ DATA= BIST0535.HW3;

                TABLES smoke drink smoke*drink;

                BY marital;

                TITLE "HW3 Frequencies of smoke, drink, and smoke*drink by marital status";

RUN;

 

Now I want to look at the relationship between educational attainment (education) and blood pressure readings (SBP and DBP), by looking at the break down of frequencies, as I did above with marital and smoking/drinking. However it is not working and I believe it is because the way the data was merged. Even if the subject is the same, the variables from the two different original data sets are appearing on separate observations. Here is a section of the proc print (I italicized subject 1, as it exemplifies this issue):

 

Obs subject site gender dob doe smoke drink marital education dov dbp sbp age
1 .   . . . . . . 14145 107 134 .
1 5 F -1092 14069 0 2 3 3 . . . 41.5086
2 .   . . . . . . 14447 115 143 .
3 .   . . . . . . 14082 103 148 .
4 .   . . . . . . 14632 87 110 .
4 1 F 892 14632 0 5 2 3 . . . 37.6181
5 .   . . . . . . 14650 66 108 .
5 2 F 259 14558 0 2 1 2 . . . 39.1485
6 .   . . . . . . 14520 75 133 .
6 4 F 1809 14429 1 2 3 3 . . . 34.5517
7 .   . . . . . . 14813 87 131 .
7 3 M -3979 14637 0 2 3 2 . . . 50.9678
8 .   . . . . . . 14285 112 149 .
8 5 M -3669 14285 0 4 2 3 . . . 49.1554
9 .   . . . . . . 14865 84 123 .
9 3 F 2438 14618 0 3 3 3 . . . 33.3470
10 .   . . . . . . 13967 88 120

 

HOW DO I FIX THIS PLEASE?


 

MBlack732
Fluorite | Level 6

Perfect! Thank you so much! Such a simple fix, I feel foolish that I missed it! Thanks again for the help!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 448 views
  • 2 likes
  • 2 in conversation