Hi everyone,
I cannot seem to get this code to change the variables from individual observations to ranges (using SAS Studio 9.4 M6). The character variable RFDiabGest reports as either TRUE or is blank. The numeric variables EstGestNum reports a range of values between 0 and at least 41. The first goal is to have RFDiabGest report as GestDiab - Y if the TRUE variable is present and GestDiab - N if the observation is blank. The second goal is to have the EstGestNum values between 0 and 36 to report preterm, and the values rom 37 and higher to report as full-term. Once this format applies, I want to run a contingency table (i.e. cross-tab) of RFDiabGest and EstGestNum.
Instead, what is happening is that the RFDiabGest row is displaying 'TRUE' (instead of the Y/N above) and the EstGestNum columns are individually displaying applicable values 25 through 42 (instead of just the two ranges above). Thoughts?
-Dennis II
One issue I'm already aware of is that the "$4" in the PROC FREQ will need to be adjusted, but I can adjust that once the formatting is correct (although I'm happy to take suggestions ahead of time as well). According to what I could find in the SAS documentation, the code appears correct, but clear it is not.
PROC FORMAT;
VALUE $RFDiabGest
'TRUE' = "GestDiab - Y"
' ' = "GestDiab - N";
VALUE EstGestNum
0 - < 37 = "Preterm"
37 < - HIGH = "Full-term";
RUN;
PROC FREQ
DATA = WORK.STUDY_COHORT_2;
FORMAT
RFDiabGest $4.
EstGestNum 2.;
TABLES RFDiabGest * EstGestNum / NOROW NOFREQ NOCOL;
RUN;
See below for the log. There were no errors, which tells me that the code is sound, but what I am asking SAS for is different than what my code is asking.
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
70
71 PROC FORMAT;
72
72 ! VALUE $RFDiabGest
73 'TRUE'="GestDiab - Y"
74 ' '="GestDiab - N";
NOTE: Format $RFDIABGEST is already on the library WORK.FORMATS.
NOTE: Format $RFDIABGEST has been output.
75
75 ! VALUE EstGestNum
76 0 - < 37="Preterm"
77 37 < - HIGH="Full-term";
NOTE: Format ESTGESTNUM is already on the library WORK.FORMATS.
NOTE: Format ESTGESTNUM has been output.
78 RUN;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 355.71k
OS Memory 31908.00k
Timestamp 10/11/2020 11:05:03 PM
Step Count 467 Switch Count 0
Page Faults 0
Page Reclaims 23
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 32
79
80
81 PROC FREQ
82 DATA=WORK.STUDY_COHORT_2;
83 FORMAT
84 RFDiabGest $4.
85 EstGestNum 2.;
86 TABLES RFDiabGest*EstGestNum / NOROW NOFREQ NOCOL;
87 RUN;
NOTE: There were 300 observations read from the data set WORK.STUDY_COHORT_2.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.05 seconds
user cpu time 0.04 seconds
system cpu time 0.00 seconds
memory 2922.12k
OS Memory 33456.00k
Timestamp 10/11/2020 11:05:03 PM
Step Count 468 Switch Count 4
Page Faults 0
Page Reclaims 367
Page Swaps 0
Voluntary Context Switches 21
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 536
88
89 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
100
Your FORMAT is looking for values that are TRUE.
Your data doesn't contain TRUE, it contains True.
Thank you Paige; I added the Missing option. Your explanation makes sense - what is the best way to apply my formats here? I thought the two statement in the PROC FREQ was supposed to do that. Clearly it didn't, which leaves two problems. 1) the correct syntax is missing and 2) the syntax that's here might be doing something unintended (if anything at all).
-Dennis II
Your FORMAT statement apply standard formats $4. and 2.
So your custom formats are not applied. Use
format variablename customformatname.;
Thank you again for your help; it's really appreciated as I learn this program. I see what I was missing and it's corrected now. Most of the code ran great, but part of it is still not reading in correctly. Even though the "GestDiab - N" displays correctly, the value that should read "GestDiab - Y" is still displaying as TRUE. Any ideas here?
PROC FORMAT;
VALUE $RFDiabGest
'TRUE' = "GestDiab - Y"
' ' = "GestDiab - N";
VALUE EstGestNum
0 - 36 = "Preterm"
37 - HIGH = "Full-term";
RUN;
PROC FREQ
DATA = WORK.STUDY_COHORT_2;
FORMAT
RFDiabGest $RFDiabGest.
EstGestNum EstGestNum.;
TABLES RFDiabGest * EstGestNum / MISSING NOROW NOFREQ NOCOL;
RUN;
Here is the log below:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
70
71
72 PROC FORMAT;
73
73 ! VALUE $RFDiabGest
74 'TRUE'="GestDiab - Y"
75 ' '="GestDiab - N";
NOTE: Format $RFDIABGEST is already on the library WORK.FORMATS.
NOTE: Format $RFDIABGEST has been output.
76
76 ! VALUE EstGestNum
77 0 - 36="Preterm"
78 37 - HIGH="Full-term";
NOTE: Format ESTGESTNUM is already on the library WORK.FORMATS.
NOTE: Format ESTGESTNUM has been output.
79 RUN;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 355.81k
OS Memory 31908.00k
Timestamp 10/11/2020 11:52:57 PM
Step Count 681 Switch Count 0
Page Faults 0
Page Reclaims 20
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
80
81
82 PROC FREQ
83 DATA=WORK.STUDY_COHORT_2;
84 FORMAT
85 RFDiabGest $RFDiabGest.
86 EstGestNum EstGestNum.;
87 TABLES RFDiabGest*EstGestNum / MISSING NOROW NOFREQ NOCOL;
88 RUN;
NOTE: There were 300 observations read from the data set WORK.STUDY_COHORT_2.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.02 seconds
user cpu time 0.03 seconds
system cpu time 0.00 seconds
memory 2897.21k
OS Memory 33456.00k
Timestamp 10/11/2020 11:52:57 PM
Step Count 682 Switch Count 4
Page Faults 0
Page Reclaims 339
Page Swaps 0
Voluntary Context Switches 18
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 528
89
90
91 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
102
Show us the incorrect output.
Please see below. Forgive me if there's a way to post this other than as a photo...
-Dennis II
Your FORMAT is looking for values that are TRUE.
Your data doesn't contain TRUE, it contains True.
You're welcome to eyeroll at my capitalization oversight - no offense will be taken 🙂 Thank you - this works as it should.
-Dennis II
As a side issue, variables that are binary are better off coded as numeric 0 or 1, rather than with text. This eliminates all capitalization issues, it eliminates misspelling issues, and it is easier to type, plus you can then perform numeric operations such as finding the mean (which is the percent that is 1) and minimums and maximums. So, if you have any control over how the data is stored (you may not have this control), use 0 and 1 for binary variables.
Agreed, and unfortunately I don't have control over this dataset. Still, I'll keep this in mind for more amenable datasets.
-Dennis II
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.