Hi everyone, I was curious if someone could help me figure out how to delete the circled values in my table so that only the "normal weight, overweight, obese, and morbidity" columns are the only ones present in my table. I screenshot pictures of my code and output below. Please let me know if you have any questions and thank you in advance for helping me out, I really appreciate it.
Your screen shots are barely legible and can't quite read the circled output values.
BUT I am going to go out on a limb and guess that you created your BMI values of 1 to 4 with code like
if BMI le 24.9 then BMI=1;
else if 25 le BMI le 29.9 then BMI=2;
and so on. Which leaves gaps for values like 24.92 which is not in either of the intervals for Bmi class =1 or 2. So you get odd values outside of the explicit ranges that you coded for.
Having worked with BMI more than once you can depending on your source be provided with up to 4 decimal places or calculated from height and weight directly have many more decimals in the result. So either Round you values to fit your code or use a proper format that includes the decimal values (My preference as the format then handles any "rounding" needed regardless of the precision of the BMI range provided.)
Instead of creating a category like that with a possibility of having to chase down ever smaller differences in decimal values then you should use a proper range on the values using the actual values.
Proc format; Value BMI 18.5< - 24.9= 'Normal Weight, BMI 18.5-24.9' 24.9< - 29.9= 'Overweight, BMI 25-29.9' 29.9< - 34.9= 'Obese, BMI 30-34.9' 34.9< - high= 'Morbidly Obese, BM ge 35' 0 < - 18.5= 'Underweight BMI<18.5' Run;
Yes, BMI has categories for underweight and you should consider them.
The < on ether side of - in the value list is an open interval 18.5< then means "any value greater than but not exactly equal to 18.5" falls in this interval.
I don't read screenshots, so please provide code and log as text using "insert sas code" and "insert code" buttons.To be able to help, it is most likely necessary to post an excerpt of the data you are using as data step using datalines.
And you will want to change the title of the topic to actually provide information about the problem you have.
DM 'LOG;CLEAR;OUTPUT;CLEAR;';
LIBNAME Final '/home/u49589061/MPBH 423/Data';
Proc format;
Value eversmk100cigs 1= '<100 total';
Value BMI
1= 'Normal Weight, BMI 18.5-24.9'
2= 'Overweight, BMI 25-29.9'
3= 'Obese, BMI 30-34.9'
4= 'Morbidly Obese, BM ge 35';
Run;
Proc sort data=final.projectdata2021;
By seqn;
Run;
Data final.p1 (Keep= bmi mortstat ageatinterview gender maritalstatus education smokenow
eversmk100cigs hichol_hx diabetes_hx liver_hx thyroid_hx bpxsar);
Set final.projectdata2021;
By seqn;
Run;
Data final.p2;
set final.p1;
Rename mortstat=TenYearMortality;
Rename ageatinterview=Age;
Rename gender=male;
Rename maritalstatus=Married;
Rename education=Education_Status;
Rename smokenow=CurrentSmoker;
Rename eversmk100cigs=Smoking_History;
Rename hichol_hx=Hypercholesterolemia;
Rename diabetes_hx=Diabetes_mellitus;
Rename liver_hx=Liver_disease;
Rename thyroid_hx=Thyroid_Problem;
Rename Bpxsar=HighSystolic_BloodPressure;
If maritalstatus=1 then maritalstatus=1;
Else if maritalstatus in (2,3,4,5,6) then maritalstatus=.;
If education in (4,5) then education=1;
Else if education in (1,2,3,7,9) then education=.;
If smokenow in (1,2) then smokenow=1;
Else if smokenow=3 then smokenow=.;
If bpxsar>=140;
If BMI=<18.5 then BMI=.;
Else if BMI=>18.5 and BMI=<24.9 then BMI=1;
Else if BMI=>25 and BMI=<29.9 then BMI=2;
Else if BMI=>30 and BMI=<34.9 then BMI=3;
Else if BMI=>35 then BMI=4;
Else BMI=.;
Label
mortstat="Ten Year Morality"
Ageatinterview="Age"
Gender="Male"
Maritalstatus="Married"
Education="Greater than High School Education"
Smokenow="Current Smoker"
eversmk100cigs="Smoked >100 cigarettes"
hichol_hx="Hypercholesterolemia"
diabetes_hx="Diabetes mellitus"
liver_hx="Liver disease"
thyroid_hx="Thyroid Problem"
Bpxsar="High Systolic Blood Pressure (>140)";
Format eversmk100cigs eversmk100cigs. bmi bmi.;
Run;
Proc contents data=final.p2;
Run;
Proc print data=final.p2 (firstobs=1 obs=50);
Run;
Proc tabulate data=final.p2;
Class bmi;
Var age male married education_status CurrentSmoker smoking_history hypercholesterolemia diabetes_mellitus liver_disease
Thyroid_problem HighSystolic_BloodPressure;
Table (age male married education_status CurrentSmoker smoking_history hypercholesterolemia diabetes_mellitus liver_disease
Thyroid_problem HighSystolic_BloodPressure) * (N Mean std colpctn rowpctn), bmi;
Format bmi bmi.;
Run;
OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 72 73 DM 'LOG;CLEAR;OUTPUT;CLEAR;'; 74 LIBNAME Final '/home/u49589061/MPBH 423/Data'; NOTE: Libref FINAL refers to the same physical library as _TEMP0. NOTE: Libref FINAL was successfully assigned as follows: Engine: V9 Physical Name: /home/u49589061/MPBH 423/Data 75 76 Proc format; 77 Value eversmk100cigs 1= '<100 total'; NOTE: Format EVERSMK100CIGS is already on the library WORK.FORMATS. NOTE: Format EVERSMK100CIGS has been output. 78 Value BMI 79 1= 'Normal Weight, BMI 18.5-24.9' 80 2= 'Overweight, BMI 25-29.9' 81 3= 'Obese, BMI 30-34.9' 82 4= 'Morbidly Obese, BM ge 35'; NOTE: Format BMI is already on the library WORK.FORMATS. NOTE: Format BMI has been output. 83 Run; NOTE: PROCEDURE FORMAT used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 243.65k OS Memory 36516.00k Timestamp 04/15/2021 05:22:22 AM Step Count 185 Switch Count 0 Page Faults 0 Page Reclaims 26 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 32 84 85 Proc sort data=final.projectdata2021; NOTE: Data file FINAL.PROJECTDATA2021.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance. 86 By seqn; 87 Run; NOTE: Input data set is already sorted, no sorting done. NOTE: PROCEDURE SORT used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 1300.75k OS Memory 37544.00k Timestamp 04/15/2021 05:22:22 AM Step Count 186 Switch Count 0 Page Faults 0 Page Reclaims 254 Page Swaps 0 Voluntary Context Switches 6 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 88 89 90 Data final.p1 (Keep= bmi mortstat ageatinterview gender maritalstatus education smokenow 91 eversmk100cigs hichol_hx diabetes_hx liver_hx thyroid_hx bpxsar); 92 Set final.projectdata2021; NOTE: Data file FINAL.PROJECTDATA2021.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance. 93 By seqn; 94 Run; NOTE: There were 55499 observations read from the data set FINAL.PROJECTDATA2021. NOTE: The data set FINAL.P1 has 55499 observations and 13 variables. NOTE: DATA statement used (Total process time): real time 0.18 seconds user cpu time 0.15 seconds system cpu time 0.01 seconds memory 3137.56k OS Memory 38828.00k Timestamp 04/15/2021 05:22:23 AM Step Count 187 Switch Count 6 Page Faults 0 Page Reclaims 375 Page Swaps 0 Voluntary Context Switches 150 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 11536 95 96 97 Data final.p2; 98 set final.p1; 99 Rename mortstat=TenYearMortality; 100 Rename ageatinterview=Age; 101 Rename gender=male; 102 Rename maritalstatus=Married; 103 Rename education=Education_Status; 104 Rename smokenow=CurrentSmoker; 105 Rename eversmk100cigs=Smoking_History; 106 Rename hichol_hx=Hypercholesterolemia; 107 Rename diabetes_hx=Diabetes_mellitus; 108 Rename liver_hx=Liver_disease; 109 Rename thyroid_hx=Thyroid_Problem; 110 Rename Bpxsar=HighSystolic_BloodPressure; 111 112 If maritalstatus=1 then maritalstatus=1; 113 Else if maritalstatus in (2,3,4,5,6) then maritalstatus=.; 114 115 If education in (4,5) then education=1; 116 Else if education in (1,2,3,7,9) then education=.; 117 118 If smokenow in (1,2) then smokenow=1; 119 Else if smokenow=3 then smokenow=.; 120 121 If bpxsar>=140; 122 123 If BMI=<18.5 then BMI=.; 124 Else if BMI=>18.5 and BMI=<24.9 then BMI=1; 125 Else if BMI=>25 and BMI=<29.9 then BMI=2; 126 Else if BMI=>30 and BMI=<34.9 then BMI=3; 127 Else if BMI=>35 then BMI=4; 128 Else BMI=.; 129 130 Label 131 mortstat="Ten Year Morality" 132 Ageatinterview="Age" 133 Gender="Male" 134 Maritalstatus="Married" 135 Education="Greater than High School Education" 136 Smokenow="Current Smoker" 137 eversmk100cigs="Smoked >100 cigarettes" 138 hichol_hx="Hypercholesterolemia" 139 diabetes_hx="Diabetes mellitus" 140 liver_hx="Liver disease" 141 thyroid_hx="Thyroid Problem" 142 Bpxsar="High Systolic Blood Pressure (>140)"; 143 144 Format eversmk100cigs eversmk100cigs. bmi bmi.; 145 Run; NOTE: There were 55499 observations read from the data set FINAL.P1. NOTE: The data set FINAL.P2 has 9854 observations and 13 variables. NOTE: DATA statement used (Total process time): real time 0.04 seconds user cpu time 0.01 seconds system cpu time 0.01 seconds memory 3311.71k OS Memory 39852.00k Timestamp 04/15/2021 05:22:23 AM Step Count 188 Switch Count 3 Page Faults 0 Page Reclaims 468 Page Swaps 0 Voluntary Context Switches 100 Involuntary Context Switches 0 Block Input Operations 11552 Block Output Operations 2056 146 147 Proc contents data=final.p2; 148 Run; NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.08 seconds user cpu time 0.08 seconds system cpu time 0.01 seconds memory 4090.12k OS Memory 38828.00k Timestamp 04/15/2021 05:22:23 AM Step Count 189 Switch Count 0 Page Faults 0 Page Reclaims 356 Page Swaps 0 Voluntary Context Switches 9 Involuntary Context Switches 0 Block Input Operations 288 Block Output Operations 24 149 150 Proc print data=final.p2 (firstobs=1 obs=50); 151 Run; NOTE: There were 50 observations read from the data set FINAL.P2. NOTE: PROCEDURE PRINT used (Total process time): real time 0.19 seconds user cpu time 0.19 seconds system cpu time 0.00 seconds memory 3249.59k OS Memory 38312.00k Timestamp 04/15/2021 05:22:23 AM Step Count 190 Switch Count 0 Page Faults 0 Page Reclaims 280 Page Swaps 0 Voluntary Context Switches 4 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 72 152 153 154 155 Proc tabulate data=final.p2; 156 Class bmi; 157 Var age male married education_status CurrentSmoker smoking_history hypercholesterolemia diabetes_mellitus liver_disease 158 Thyroid_problem HighSystolic_BloodPressure; 159 Table (age male married education_status CurrentSmoker smoking_history hypercholesterolemia diabetes_mellitus 159 ! liver_disease 160 Thyroid_problem HighSystolic_BloodPressure) * (N Mean std colpctn rowpctn), bmi; 161 Format bmi bmi.; 162 Run; NOTE: There were 9854 observations read from the data set FINAL.P2. NOTE: PROCEDURE TABULATE used (Total process time): real time 0.09 seconds user cpu time 0.07 seconds system cpu time 0.01 seconds memory 9368.90k OS Memory 45780.00k Timestamp 04/15/2021 05:22:23 AM Step Count 191 Switch Count 13 Page Faults 0 Page Reclaims 2128 Page Swaps 0 Voluntary Context Switches 105 Involuntary Context Switches 1 Block Input Operations 1792 Block Output Operations 1680 163 164 165 166 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 178
So I was able to run my code without any problems, but I need help grouping my variables into general categories and putting a descriptive title on these categories using proc tabulate. Specifically, grouping the variables "Age, Male, Married, and Greater than high school education" together so that they are all under the Demographics group. Then I want to make a Risk factors and medical hx group that contains "current smoker, smoke >100 cigarettes, hypercholesterolemia, diabetes mellitus, liver disease, and thyroid problem." Then lastly, I want to make a Clinical presentation group that only contains "High systolic blood pressure (>140)" in it.
You have a data quality problem, with rogue values for variable BMI.
I really should clean that. A poorer alternative is to add a where statement such as
where BMI < 5;
in your proc tabulate.
Your screen shots are barely legible and can't quite read the circled output values.
BUT I am going to go out on a limb and guess that you created your BMI values of 1 to 4 with code like
if BMI le 24.9 then BMI=1;
else if 25 le BMI le 29.9 then BMI=2;
and so on. Which leaves gaps for values like 24.92 which is not in either of the intervals for Bmi class =1 or 2. So you get odd values outside of the explicit ranges that you coded for.
Having worked with BMI more than once you can depending on your source be provided with up to 4 decimal places or calculated from height and weight directly have many more decimals in the result. So either Round you values to fit your code or use a proper format that includes the decimal values (My preference as the format then handles any "rounding" needed regardless of the precision of the BMI range provided.)
Instead of creating a category like that with a possibility of having to chase down ever smaller differences in decimal values then you should use a proper range on the values using the actual values.
Proc format; Value BMI 18.5< - 24.9= 'Normal Weight, BMI 18.5-24.9' 24.9< - 29.9= 'Overweight, BMI 25-29.9' 29.9< - 34.9= 'Obese, BMI 30-34.9' 34.9< - high= 'Morbidly Obese, BM ge 35' 0 < - 18.5= 'Underweight BMI<18.5' Run;
Yes, BMI has categories for underweight and you should consider them.
The < on ether side of - in the value list is an open interval 18.5< then means "any value greater than but not exactly equal to 18.5" falls in this interval.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.