BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
kristiepauly
Obsidian | Level 7

Hello All, 

I have the following proc format that is not running correctly. The variable age_first_pres_days is a character variable. I entered spaces before the lesser value as I was getting the ERROR: Start is greater than end: -. I'm not sure if the multiple spaces are screwing something up? The values are entered with 2 decimal places because this is how they appear in the dataset (I've tried with/without the decimal places, it makes no difference). As you can see from the proc freq, only 3 of the formats (Month 25-36, Month 37-48, Month 49-60) seem to work but yet not every time even for those. 

proc format;  *not working;
	value $age_first_pres_days  
							"   0.00" -"   3.00" = "Day 0 - Day 3"
							"   4.00" -"  14.00" = "Day 4 - Day 14"
							"  15.00" - "  30.00" = "Day 15 - Day 30"
							"  31.00"- " 179.00" = "Month 2 - Month 6"
							" 180.00"- " 365.00" = "Month 7-12"
							" 366.00" -" 730.00" = "Month 13-24"
							" 731.00" - "1095.00" = "Month 25-36"
							"1096.00" - "1460.00" = "Month 37-48"
							"1461.00" - "1825.00" = "Month 49-60";
RUN;
Proc print data=work.PDE_by_subjectid label;
	var age_first_pres_days;
	format age_first_pres_days $age_first_pres_days.;
	run;

 

The FREQ Procedure

Age at presentation
age_first_pres_days	Frequency	Percent
 	                   486	       83.36
Month 25-36	           36	        6.17
Month 37-48	            4	       0.69
Month 49-60	            4	       0.69
2	                    6	       1.03
20 1 0.17 21 1 0.17 211 1 0.17 233 1 0.17 25 2 0.34 264 1 0.17 3 7 1.20 35 1 0.17

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

By turning actual months (which is a number) into character string, you obtain the incorrect sorting that you see. Had you done this PROC FORMAT with a numeric variable indicating the number of months, everything should work fine. So first, create a new numeric variable from the character value, then assign this numeric (not character) format to the numeric variable.

 

proc format; 
	value age_first_pres_days  
							0-3 = "Day 0 - Day 3"
							4-14= "Day 4 - Day 14"
							15-30= "Day 15 - Day 30"
							31-179 = "Month 2 - Month 6"
							180-365 = "Month 7-12"
							366-730= "Month 13-24"
							731-1095= "Month 25-36"
							1096-1460= "Month 37-48"
							1461-1825= "Month 49-60";
RUN;

 

The concept you need to understand here is that numeric variables sort numerically, while character strings sort alphabetically. Since you want a numerically ordered result from PROC FREQ, you must use a numeric variable. In addition, since month is a number, there are only rare situations where there is a benefit to turning it into a character variable — not to be confused with assigning it a format so that it looks like a text string. When you assign a format to a numeric variable so that it looks like a text string, the underlying numeric value is still used for sorting.

--
Paige Miller

View solution in original post

25 REPLIES 25
Reeza
Super User

Is age_first_pres_days numeric or character variable?

If it's character is it always with two decimal places?

Character sequences don't sort as expected (alphabetical default) instead of numeric so your format may require some more details 

 

ballardw
Super User

Ranges of character values do not compare the underlying numeric value. You have also defined ranges that start with blanks.

ANY character value that starts with 1 will come before any character value starting with 2, 3, 4, ...  9. Character comparisons compare 1st character to first, if that is equal then goes to second character to 2nd an so forth.

So a character value of '1234' will fall between strings of '123' and '245'. See this code:

data _null_;
   if '123' le '1234' le '234' then put "in range";
   else put "out of range";
run;
    

If you really want to do such is strongly recommend making sure that your "age" variable that you want to apply a range to is numeric.

 

Also when you define a range with a leading space then the value must have the leading space.

Proc format;
value $example
'   1' = '3 spaces and 1'
'  1'  = '2 spaces and 1'
' 1'   = '1 space  and 1'
'1'    = '1 with no spaces'
other  = 'something else'
;

data _null_;
   x = '1';
   y = '  1';
   z = '     1';/* 5 spaces*/
   put x= $example. y=$example. z=$example.;
run;
    

 

 

PaigeMiller
Diamond | Level 26

By turning actual months (which is a number) into character string, you obtain the incorrect sorting that you see. Had you done this PROC FORMAT with a numeric variable indicating the number of months, everything should work fine. So first, create a new numeric variable from the character value, then assign this numeric (not character) format to the numeric variable.

 

proc format; 
	value age_first_pres_days  
							0-3 = "Day 0 - Day 3"
							4-14= "Day 4 - Day 14"
							15-30= "Day 15 - Day 30"
							31-179 = "Month 2 - Month 6"
							180-365 = "Month 7-12"
							366-730= "Month 13-24"
							731-1095= "Month 25-36"
							1096-1460= "Month 37-48"
							1461-1825= "Month 49-60";
RUN;

 

The concept you need to understand here is that numeric variables sort numerically, while character strings sort alphabetically. Since you want a numerically ordered result from PROC FREQ, you must use a numeric variable. In addition, since month is a number, there are only rare situations where there is a benefit to turning it into a character variable — not to be confused with assigning it a format so that it looks like a text string. When you assign a format to a numeric variable so that it looks like a text string, the underlying numeric value is still used for sorting.

--
Paige Miller
kristiepauly
Obsidian | Level 7

Hi Paige, 

I tried your suggestion, renamed the variable from character to numeric and then ran the format but I'm still not getting the correct format.  Also, any idea why the period isn't showing as "missing" as I've stated in the format? 

Data want;
	Set import; 
	age_first_pres_days = (RENAME= (age_first_pres_days=age_first_pres_days_Num));
	age_first_pres_days= INPUT(age_first_pres_days_Num, 5.);
	Drop age_first_pres_days_Num;
	if _N_ < 2 then delete;
Run;
proc format;
	value age_first_pres_days
							.	= "Missing"
							0-3 = "Day 0 - Day 3"
							4-14= "Day 4 - Day 14"
							15-30= "Day 15 - Day 30"
							31-179 = "Month 2 - Month 6"
							180-365 = "Month 7-12"
							366-730= "Month 13-24"
							731-1095= "Month 25-36"
							1096-1460= "Month 37-48"
							1461-1825= "Month 49-60";
run;

RESULTS:
The FREQ Procedure

Age at presentation
age_first_pres_days	Frequency	Percent
           .	           19	         16.38
           0	           30	         25.86
           1	            4	          3.45
           10	            2	          1.72
           12	            2	          1.72
Tom
Super User Tom
Super User

We cannot tell from what you posted what code you used to produce the frequency table. 
Are you sure you used the format with the PROC FREQ step?

kristiepauly
Obsidian | Level 7

Great catch.  I used the character format in the proc freq.  I changed it to numeric but now I'm getting an error that i'm using a numeric format with character variable.  So now it appears the code to rename isn't working. I have not changed the renaming code listed above. 

Corrected proc freq:
PROC Freq DATA =genetics.table1demo;
	Tables age_first_pres_days/ missing nocum;
	format age_first_pres_days age_first_pres_days.;
RUN;

Error: 
71         format age_first_pres_days age_first_pres_days.;
 ERROR: You are trying to use the numeric format AGE_FIRS with the character variable age_first_pres_days in data set 
        GENETICS.TABLE1DEMO.

from character to numeric isn't working.

PaigeMiller
Diamond | Level 26

I tried your suggestion, renamed the variable from character to numeric

 

You can't rename a character variable and make it numeric (and so that was not my suggestion). I specifically said "create a new numeric variable from the character value". You have to create a new variable that is numeric. Something like this:

 

data new_var;
     set import;
     age_first_pres_days_num = input(age_first_pres_days,4.);
     format age_first_pres_days_num age_first_pres_days.;
run;

 

which assumes that the format named age_first_pres_days is a numeric format (no dollar sign before the format name) and not a character format, so you will have to re-write your character format code in PROC FORMAT and make it produce a numeric format.

--
Paige Miller
kristiepauly
Obsidian | Level 7

The new variable is created. Both variables show in proc contents.  I have tried the following with no success: Dropping both the variable names (separately), formatting for both variables in proc format and using both variable names in the proc freqs (in tables and format statements). I make sure everything matches as far as variable names in all the steps (Data, format, freqs). No matter what I put in the various fields I get the error the variable isn't found.

Data Genetics.Table1demo;
	Set import; 
	age_first_pres_days_num= INPUT(age_first_pres_days, 4.);
	if _N_ < 2 then delete;
Run;

Proc Contents showing as much: 
Alphabetic List of Variables and Attributes
#	Variable	Type	Len	Format	Informat	Label
5	Sex	Char	15	$15.	$15.	Sex
9	age_first_b6_days	Char	4	$4.	$4.	age_first_b6_days
13	age_first_bioch_months	Char	42	$42.	$42.	age_first_bioch_months
8	age_first_pres_days	Char	4	$4.	$4.	age_first_pres_days
64	age_first_pres_days_num  Num	8	 

Proc format:
proc format;
	value age_first_pres_days
							.	= "Missing"
							0-3 = "Day 0 - Day 3"
							4-14= "Day 4 - Day 14"
							15-30= "Day 15 - Day 30"
							31-179 = "Month 2 - Month 6"
							180-365 = "Month 7-12"
							366-730= "Month 13-24"
							731-1095= "Month 25-36"
							1096-1460= "Month 37-48"
							1461-1825= "Month 49-60";
run;	 
PROC Freq DATA =genetics.table1demo;
	Tables age_first_pres_day_num/ missing nocum;
	format age_first_pres_days age_first_pres_days.;
RUN; ***

 70         Tables age_first_pres_day_num/ missing nocum;
 ERROR: Variable AGE_FIRST_PRES_DAY_NUM not found.	

or
Tables age_first_pres_day/ missing nocum; ERROR: Variable AGE_FIRST_PRES_DAY_NUM not found.
PaigeMiller
Diamond | Level 26

Your PROC CONTENTS output does not show the data set name. Please provide the equivalent PROC CONTENTS output to that shown here: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/proc/n1a5k5u51pvnlhn17j9v82nciokh.htm so I can see what the data set name is, and the list of variables and their attributes. (A screen capture is fine, or text that is copied and pasted into your reply)

 

Please show us the entire log for this piece of code. Do not show us selected parts of the log, we need to see every single line, every single character.

--
Paige Miller
kristiepauly
Obsidian | Level 7
proc contents data=Genetics.Table1demo;
run;


 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 68         
 69         PROC Freq DATA =genetics.table1demo;
 70         Tables age_first_pres_day/ missing nocum;
 ERROR: Variable AGE_FIRST_PRES_DAY not found.
 71         format age_first_pres_days age_first_pres_days.;
 72         RUN;
 
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              1356.56k
       OS Memory           29488.00k
       Timestamp           08/02/2022 08:12:15 PM
       Step Count                        1284  Switch Count  0
       Page Faults                       0
       Page Reclaims                     212
       Page Swaps                        0
       Voluntary Context Switches        5
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 73         
 74         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 84         

kristiepauly_0-1659470852570.png

 

 
Tom
Super User Tom
Super User

The only way to get this message:

 70         Tables age_first_pres_day_num/ missing nocum;
 ERROR: Variable AGE_FIRST_PRES_DAY_NUM not found.	

Is if you ran the PROC FREQ on a different dataset than the PROC CONTENTS.

 

Also you are applying the format to the wrong variable.

PROC Freq DATA =genetics.table1demo;
	Tables age_first_pres_day_num/ missing nocum;
	format age_first_pres_days_num age_first_pres_days.;
RUN; 

Also do not insert random * characters into your code. 

RUN; ***

The * character starts a COMMENT STATEMENT so everything after it until the next semicolon is a comment.

 

Reeza
Super User
*convert Age to a numeric variable and delete first row;
Data Genetics.Table1demo;
	Set import; 
	age_first_pres_days_num= INPUT(age_first_pres_days, 4.);
	if _N_ < 2 then delete;
Run;

*check conversion;
proc contents data=genetics.table1demo;
run;

*create format for age category;
proc format;
	value age_first_pres_days
							.	= "Missing"
							0-3 = "Day 0 - Day 3"
							4-14= "Day 4 - Day 14"
							15-30= "Day 15 - Day 30"
							31-179 = "Month 2 - Month 6"
							180-365 = "Month 7-12"
							366-730= "Month 13-24"
							731-1095= "Month 25-36"
							1096-1460= "Month 37-48"
							1461-1825= "Month 49-60";
run;	 

*summary table;
PROC Freq DATA =genetics.table1demo;
	Tables age_first_pres_days_num/ missing nocum;
	format age_first_pres_days_num age_first_pres_days.;
RUN; 

Please run the exact code above, assuming your import table exists, and post the FULL LOG.

kristiepauly
Obsidian | Level 7

Well now I'm completely confused. I copy and pasted your code underneath my code. They are EXACTLY the same; I verified letter by letter. For some reason, your code works and mine doesn't (error log below).

My code: 
Data Genetics.Table1demo;
	Set import; 
	age_first_pres_days_num= INPUT(age_first_pres_days, 4.);
	if _N_ < 2 then delete;
Run;
proc contents data=Genetics.Table1demo; run;
PROC Freq DATA =genetics.table1demo; Tables age_first_pres_day_num/ missing nocum; format age_first_pres_days_num age_first_pres_days.; RUN;
My log report: 1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 PROC Freq DATA =genetics.table1demo; 70 Tables age_first_pres_day_num/ missing nocum; ERROR: Variable AGE_FIRST_PRES_DAY_NUM not found. 71 format age_first_pres_days_num age_first_pres_days.; 72 RUN; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE FREQ used (Total process time): real time 0.00 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 1468.68k OS Memory 29488.00k Timestamp 08/02/2022 08:20:48 PM Step Count 1308 Switch Count 0 Page Faults 0 Page Reclaims 212 Page Swaps 0 Voluntary Context Switches 2 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 73 74 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 84 Your code: Data Genetics.Table1demo; Set import; age_first_pres_days_num= INPUT(age_first_pres_days, 4.); if _N_ < 2 then delete; Run; proc contents data=genetics.table1demo; run; proc format; value age_first_pres_days . = "Missing" 0-3 = "Day 0 - Day 3" 4-14= "Day 4 - Day 14" 15-30= "Day 15 - Day 30" 31-179 = "Month 2 - Month 6" 180-365 = "Month 7-12" 366-730= "Month 13-24" 731-1095= "Month 25-36" 1096-1460= "Month 37-48" 1461-1825= "Month 49-60"; run; PROC Freq DATA =genetics.table1demo; Tables age_first_pres_days_num/ missing nocum; format age_first_pres_days_num age_first_pres_days.; RUN;

e works.  What would explain this? 

Reeza
Super User

You aren't pasting your full code and log so it's hard to say. One guess - you aren't running things in order step by step. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 25 replies
  • 4753 views
  • 7 likes
  • 5 in conversation