Hello All,
I have the following proc format that is not running correctly. The variable age_first_pres_days is a character variable. I entered spaces before the lesser value as I was getting the ERROR: Start is greater than end: -. I'm not sure if the multiple spaces are screwing something up? The values are entered with 2 decimal places because this is how they appear in the dataset (I've tried with/without the decimal places, it makes no difference). As you can see from the proc freq, only 3 of the formats (Month 25-36, Month 37-48, Month 49-60) seem to work but yet not every time even for those.
proc format; *not working; value $age_first_pres_days " 0.00" -" 3.00" = "Day 0 - Day 3" " 4.00" -" 14.00" = "Day 4 - Day 14" " 15.00" - " 30.00" = "Day 15 - Day 30" " 31.00"- " 179.00" = "Month 2 - Month 6" " 180.00"- " 365.00" = "Month 7-12" " 366.00" -" 730.00" = "Month 13-24" " 731.00" - "1095.00" = "Month 25-36" "1096.00" - "1460.00" = "Month 37-48" "1461.00" - "1825.00" = "Month 49-60"; RUN; Proc print data=work.PDE_by_subjectid label; var age_first_pres_days; format age_first_pres_days $age_first_pres_days.; run;
The FREQ Procedure Age at presentation age_first_pres_days Frequency Percent 486 83.36 Month 25-36 36 6.17 Month 37-48 4 0.69 Month 49-60 4 0.69 2 6 1.03
20 1 0.17 21 1 0.17 211 1 0.17 233 1 0.17 25 2 0.34 264 1 0.17 3 7 1.20 35 1 0.17
By turning actual months (which is a number) into character string, you obtain the incorrect sorting that you see. Had you done this PROC FORMAT with a numeric variable indicating the number of months, everything should work fine. So first, create a new numeric variable from the character value, then assign this numeric (not character) format to the numeric variable.
proc format;
value age_first_pres_days
0-3 = "Day 0 - Day 3"
4-14= "Day 4 - Day 14"
15-30= "Day 15 - Day 30"
31-179 = "Month 2 - Month 6"
180-365 = "Month 7-12"
366-730= "Month 13-24"
731-1095= "Month 25-36"
1096-1460= "Month 37-48"
1461-1825= "Month 49-60";
RUN;
The concept you need to understand here is that numeric variables sort numerically, while character strings sort alphabetically. Since you want a numerically ordered result from PROC FREQ, you must use a numeric variable. In addition, since month is a number, there are only rare situations where there is a benefit to turning it into a character variable — not to be confused with assigning it a format so that it looks like a text string. When you assign a format to a numeric variable so that it looks like a text string, the underlying numeric value is still used for sorting.
Is age_first_pres_days numeric or character variable?
If it's character is it always with two decimal places?
Character sequences don't sort as expected (alphabetical default) instead of numeric so your format may require some more details
Ranges of character values do not compare the underlying numeric value. You have also defined ranges that start with blanks.
ANY character value that starts with 1 will come before any character value starting with 2, 3, 4, ... 9. Character comparisons compare 1st character to first, if that is equal then goes to second character to 2nd an so forth.
So a character value of '1234' will fall between strings of '123' and '245'. See this code:
data _null_; if '123' le '1234' le '234' then put "in range"; else put "out of range"; run;
If you really want to do such is strongly recommend making sure that your "age" variable that you want to apply a range to is numeric.
Also when you define a range with a leading space then the value must have the leading space.
Proc format; value $example ' 1' = '3 spaces and 1' ' 1' = '2 spaces and 1' ' 1' = '1 space and 1' '1' = '1 with no spaces' other = 'something else' ; data _null_; x = '1'; y = ' 1'; z = ' 1';/* 5 spaces*/ put x= $example. y=$example. z=$example.; run;
By turning actual months (which is a number) into character string, you obtain the incorrect sorting that you see. Had you done this PROC FORMAT with a numeric variable indicating the number of months, everything should work fine. So first, create a new numeric variable from the character value, then assign this numeric (not character) format to the numeric variable.
proc format;
value age_first_pres_days
0-3 = "Day 0 - Day 3"
4-14= "Day 4 - Day 14"
15-30= "Day 15 - Day 30"
31-179 = "Month 2 - Month 6"
180-365 = "Month 7-12"
366-730= "Month 13-24"
731-1095= "Month 25-36"
1096-1460= "Month 37-48"
1461-1825= "Month 49-60";
RUN;
The concept you need to understand here is that numeric variables sort numerically, while character strings sort alphabetically. Since you want a numerically ordered result from PROC FREQ, you must use a numeric variable. In addition, since month is a number, there are only rare situations where there is a benefit to turning it into a character variable — not to be confused with assigning it a format so that it looks like a text string. When you assign a format to a numeric variable so that it looks like a text string, the underlying numeric value is still used for sorting.
Hi Paige,
I tried your suggestion, renamed the variable from character to numeric and then ran the format but I'm still not getting the correct format. Also, any idea why the period isn't showing as "missing" as I've stated in the format?
Data want; Set import; age_first_pres_days = (RENAME= (age_first_pres_days=age_first_pres_days_Num)); age_first_pres_days= INPUT(age_first_pres_days_Num, 5.); Drop age_first_pres_days_Num; if _N_ < 2 then delete; Run; proc format; value age_first_pres_days . = "Missing" 0-3 = "Day 0 - Day 3" 4-14= "Day 4 - Day 14" 15-30= "Day 15 - Day 30" 31-179 = "Month 2 - Month 6" 180-365 = "Month 7-12" 366-730= "Month 13-24" 731-1095= "Month 25-36" 1096-1460= "Month 37-48" 1461-1825= "Month 49-60"; run; RESULTS: The FREQ Procedure Age at presentation age_first_pres_days Frequency Percent . 19 16.38 0 30 25.86 1 4 3.45 10 2 1.72 12 2 1.72
We cannot tell from what you posted what code you used to produce the frequency table.
Are you sure you used the format with the PROC FREQ step?
Great catch. I used the character format in the proc freq. I changed it to numeric but now I'm getting an error that i'm using a numeric format with character variable. So now it appears the code to rename isn't working. I have not changed the renaming code listed above.
Corrected proc freq: PROC Freq DATA =genetics.table1demo; Tables age_first_pres_days/ missing nocum; format age_first_pres_days age_first_pres_days.; RUN; Error: 71 format age_first_pres_days age_first_pres_days.; ERROR: You are trying to use the numeric format AGE_FIRS with the character variable age_first_pres_days in data set GENETICS.TABLE1DEMO.
from character to numeric isn't working.
I tried your suggestion, renamed the variable from character to numeric
You can't rename a character variable and make it numeric (and so that was not my suggestion). I specifically said "create a new numeric variable from the character value". You have to create a new variable that is numeric. Something like this:
data new_var;
set import;
age_first_pres_days_num = input(age_first_pres_days,4.);
format age_first_pres_days_num age_first_pres_days.;
run;
which assumes that the format named age_first_pres_days is a numeric format (no dollar sign before the format name) and not a character format, so you will have to re-write your character format code in PROC FORMAT and make it produce a numeric format.
The new variable is created. Both variables show in proc contents. I have tried the following with no success: Dropping both the variable names (separately), formatting for both variables in proc format and using both variable names in the proc freqs (in tables and format statements). I make sure everything matches as far as variable names in all the steps (Data, format, freqs). No matter what I put in the various fields I get the error the variable isn't found.
Data Genetics.Table1demo; Set import; age_first_pres_days_num= INPUT(age_first_pres_days, 4.); if _N_ < 2 then delete; Run; Proc Contents showing as much: Alphabetic List of Variables and Attributes # Variable Type Len Format Informat Label 5 Sex Char 15 $15. $15. Sex 9 age_first_b6_days Char 4 $4. $4. age_first_b6_days 13 age_first_bioch_months Char 42 $42. $42. age_first_bioch_months 8 age_first_pres_days Char 4 $4. $4. age_first_pres_days 64 age_first_pres_days_num Num 8 Proc format: proc format; value age_first_pres_days . = "Missing" 0-3 = "Day 0 - Day 3" 4-14= "Day 4 - Day 14" 15-30= "Day 15 - Day 30" 31-179 = "Month 2 - Month 6" 180-365 = "Month 7-12" 366-730= "Month 13-24" 731-1095= "Month 25-36" 1096-1460= "Month 37-48" 1461-1825= "Month 49-60"; run; PROC Freq DATA =genetics.table1demo; Tables age_first_pres_day_num/ missing nocum; format age_first_pres_days age_first_pres_days.; RUN; *** 70 Tables age_first_pres_day_num/ missing nocum; ERROR: Variable AGE_FIRST_PRES_DAY_NUM not found.
or
Tables age_first_pres_day/ missing nocum; ERROR: Variable AGE_FIRST_PRES_DAY_NUM not found.
Your PROC CONTENTS output does not show the data set name. Please provide the equivalent PROC CONTENTS output to that shown here: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/proc/n1a5k5u51pvnlhn17j9v82nciokh.htm so I can see what the data set name is, and the list of variables and their attributes. (A screen capture is fine, or text that is copied and pasted into your reply)
Please show us the entire log for this piece of code. Do not show us selected parts of the log, we need to see every single line, every single character.
proc contents data=Genetics.Table1demo; run; 1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 PROC Freq DATA =genetics.table1demo; 70 Tables age_first_pres_day/ missing nocum; ERROR: Variable AGE_FIRST_PRES_DAY not found. 71 format age_first_pres_days age_first_pres_days.; 72 RUN; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE FREQ used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 1356.56k OS Memory 29488.00k Timestamp 08/02/2022 08:12:15 PM Step Count 1284 Switch Count 0 Page Faults 0 Page Reclaims 212 Page Swaps 0 Voluntary Context Switches 5 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 73 74 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 84
The only way to get this message:
70 Tables age_first_pres_day_num/ missing nocum; ERROR: Variable AGE_FIRST_PRES_DAY_NUM not found.
Is if you ran the PROC FREQ on a different dataset than the PROC CONTENTS.
Also you are applying the format to the wrong variable.
PROC Freq DATA =genetics.table1demo;
Tables age_first_pres_day_num/ missing nocum;
format age_first_pres_days_num age_first_pres_days.;
RUN;
Also do not insert random * characters into your code.
RUN; ***
The * character starts a COMMENT STATEMENT so everything after it until the next semicolon is a comment.
*convert Age to a numeric variable and delete first row;
Data Genetics.Table1demo;
Set import;
age_first_pres_days_num= INPUT(age_first_pres_days, 4.);
if _N_ < 2 then delete;
Run;
*check conversion;
proc contents data=genetics.table1demo;
run;
*create format for age category;
proc format;
value age_first_pres_days
. = "Missing"
0-3 = "Day 0 - Day 3"
4-14= "Day 4 - Day 14"
15-30= "Day 15 - Day 30"
31-179 = "Month 2 - Month 6"
180-365 = "Month 7-12"
366-730= "Month 13-24"
731-1095= "Month 25-36"
1096-1460= "Month 37-48"
1461-1825= "Month 49-60";
run;
*summary table;
PROC Freq DATA =genetics.table1demo;
Tables age_first_pres_days_num/ missing nocum;
format age_first_pres_days_num age_first_pres_days.;
RUN;
Please run the exact code above, assuming your import table exists, and post the FULL LOG.
Well now I'm completely confused. I copy and pasted your code underneath my code. They are EXACTLY the same; I verified letter by letter. For some reason, your code works and mine doesn't (error log below).
My code: Data Genetics.Table1demo; Set import; age_first_pres_days_num= INPUT(age_first_pres_days, 4.); if _N_ < 2 then delete; Run;
proc contents data=Genetics.Table1demo; run;
PROC Freq DATA =genetics.table1demo; Tables age_first_pres_day_num/ missing nocum; format age_first_pres_days_num age_first_pres_days.; RUN;
My log report: 1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 PROC Freq DATA =genetics.table1demo; 70 Tables age_first_pres_day_num/ missing nocum; ERROR: Variable AGE_FIRST_PRES_DAY_NUM not found. 71 format age_first_pres_days_num age_first_pres_days.; 72 RUN; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE FREQ used (Total process time): real time 0.00 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 1468.68k OS Memory 29488.00k Timestamp 08/02/2022 08:20:48 PM Step Count 1308 Switch Count 0 Page Faults 0 Page Reclaims 212 Page Swaps 0 Voluntary Context Switches 2 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 73 74 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 84 Your code: Data Genetics.Table1demo; Set import; age_first_pres_days_num= INPUT(age_first_pres_days, 4.); if _N_ < 2 then delete; Run; proc contents data=genetics.table1demo; run; proc format; value age_first_pres_days . = "Missing" 0-3 = "Day 0 - Day 3" 4-14= "Day 4 - Day 14" 15-30= "Day 15 - Day 30" 31-179 = "Month 2 - Month 6" 180-365 = "Month 7-12" 366-730= "Month 13-24" 731-1095= "Month 25-36" 1096-1460= "Month 37-48" 1461-1825= "Month 49-60"; run; PROC Freq DATA =genetics.table1demo; Tables age_first_pres_days_num/ missing nocum; format age_first_pres_days_num age_first_pres_days.; RUN;
e works. What would explain this?
You aren't pasting your full code and log so it's hard to say. One guess - you aren't running things in order step by step.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.