I have a character field with length 3 in a data set. Some sample values are: 001, 234, 546,1, 25, 025.
I am using tabulate to identify valid and invalid values in that field. The problem is that this code is not differentiating between the value 001 and 1 or between 25 and 025.
proc format;
value $code (notsorted)
" " = "blank"
"000"-"999" = "3 digits";
run;
proc tabulate data=dataset1 missing;
format code $code.;
class code / preloadformat;
table code, n / printmiss;
run;
@tjsari wrote:
The variable "code" is character length 3
Length is a property, formats work on values.
Unfortunately when doing character comparisons '1' is greater than '000' because in the character by character comparison the 1 as the first character is greater than the 0 of 000.
You would have to list all of the values to apply the format two and not use a range: '000', '001', '002' ... '999'. Not impossible
data work.cntl; fmtname='Code'; type='C'; length label $ 10 start $ 3; start=''; label='blank';output; do i= 0 to 9; label='1 digit'; start= put(i,z1.); output; end; do i= 0 to 99; label='2 digits'; start= put(i,z2.); output; end; do i= 0 to 999; label='3 digits'; start= put(i,z3.); output; end; run; proc format library=work cntlin=work.cntl; run; data junk; length x $ 3; x=''; put x $code.; x='1'; put x $code.; x='01';put x $code.; x='101';put x $code.; x='001';put x $code.; run;
If you have ' 1' values, with an actual leading space then the above code will need something to create values with leading spaces.
Sounds like you have accidentally loaded the values into a numeric variable instead of a character variable.
Can you re-read the source data?
The variable "code" is character length 3
@tjsari wrote:
The variable "code" is character length 3
Length is a property, formats work on values.
Unfortunately when doing character comparisons '1' is greater than '000' because in the character by character comparison the 1 as the first character is greater than the 0 of 000.
You would have to list all of the values to apply the format two and not use a range: '000', '001', '002' ... '999'. Not impossible
data work.cntl; fmtname='Code'; type='C'; length label $ 10 start $ 3; start=''; label='blank';output; do i= 0 to 9; label='1 digit'; start= put(i,z1.); output; end; do i= 0 to 99; label='2 digits'; start= put(i,z2.); output; end; do i= 0 to 999; label='3 digits'; start= put(i,z3.); output; end; run; proc format library=work cntlin=work.cntl; run; data junk; length x $ 3; x=''; put x $code.; x='1'; put x $code.; x='01';put x $code.; x='101';put x $code.; x='001';put x $code.; run;
If you have ' 1' values, with an actual leading space then the above code will need something to create values with leading spaces.
Thank you!
Before you go too far down this path, you should be aware of a pitfall. As character strings, the range "000" - "999" includes many values that are not three digits. All of these values are included in that range:
5 4
3AD
68
1234 (perhaps not applicable, if your variable is defined as character length 3)
3.1
You may want to rethink the plan here.
Yes, that is the problem. Is there another way to identify the observations that have 3 digits and those who do not have 3 digits?
In a DATA step:
if length(var) = 3 and (100 <= input(var, ??3.) <= 999);
A WHERE statement becomes tricky since it does not permit the "??" to suppress messages about invalid data. It generates a warning when you remove the "??" but WHERE is legal in a PROC step if you can live with the warning.
This also works, thank you
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.