DATA Step, Macro, Functions and more

How to define a format for a character value differentiating by its length?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

How to define a format for a character value differentiating by its length?

I have a character field with length 3 in a data set. Some sample values are: 001, 234, 546,1, 25, 025.

I am using tabulate to identify valid and invalid values in that field. The problem is that this code is not differentiating between the value 001 and 1 or between 25 and 025.

 

proc format; 

   value $code (notsorted)

           " " = "blank"

           "000"-"999" = "3 digits";

run;

 

proc tabulate data=dataset1 missing;

   format code $code.;

   class code / preloadformat;

   table code, n / printmiss;

run; 

 


Accepted Solutions
Solution
‎02-07-2018 03:46 PM
Super User
Posts: 13,000

Re: How to define a format for a character value differentiating by its length?


tjsari wrote:

The variable "code" is character length 3


Length is a property, formats work on values.

 

Unfortunately when doing character comparisons '1' is greater than '000' because in the character by character comparison the 1 as the first character is greater than the 0 of 000.

 

You would have to list all of the values to apply the format two and not use a range: '000', '001', '002' ... '999'. Not impossible

 

data work.cntl;
   fmtname='Code';
   type='C';
   length label $ 10 start $ 3;
   start=''; label='blank';output;
   do i= 0 to 9;
      label='1 digit';
      start= put(i,z1.);
      output;
   end;
   do i= 0 to 99;
      label='2 digits';
      start= put(i,z2.);
      output;
   end;
   do i= 0 to 999;
      label='3 digits';
      start= put(i,z3.);
      output;
   end;
run;

proc format library=work cntlin=work.cntl;
run;

data junk;
   length x $ 3;
   x=''; put x $code.;
   x='1'; put x $code.;
   x='01';put x $code.;
   x='101';put x $code.;
   x='001';put x $code.;
run;

If you have ' 1' values, with an actual leading space then the above code will need something to create values with leading spaces.

 

View solution in original post


All Replies
Super User
Super User
Posts: 7,844

Re: How to define a format for a character value differentiating by its length?

Sounds like you have accidentally loaded the values into a numeric variable instead of a character variable.

Can you re-read the source data?

Occasional Contributor
Posts: 6

Re: How to define a format for a character value differentiating by its length?

The variable "code" is character length 3

Solution
‎02-07-2018 03:46 PM
Super User
Posts: 13,000

Re: How to define a format for a character value differentiating by its length?


tjsari wrote:

The variable "code" is character length 3


Length is a property, formats work on values.

 

Unfortunately when doing character comparisons '1' is greater than '000' because in the character by character comparison the 1 as the first character is greater than the 0 of 000.

 

You would have to list all of the values to apply the format two and not use a range: '000', '001', '002' ... '999'. Not impossible

 

data work.cntl;
   fmtname='Code';
   type='C';
   length label $ 10 start $ 3;
   start=''; label='blank';output;
   do i= 0 to 9;
      label='1 digit';
      start= put(i,z1.);
      output;
   end;
   do i= 0 to 99;
      label='2 digits';
      start= put(i,z2.);
      output;
   end;
   do i= 0 to 999;
      label='3 digits';
      start= put(i,z3.);
      output;
   end;
run;

proc format library=work cntlin=work.cntl;
run;

data junk;
   length x $ 3;
   x=''; put x $code.;
   x='1'; put x $code.;
   x='01';put x $code.;
   x='101';put x $code.;
   x='001';put x $code.;
run;

If you have ' 1' values, with an actual leading space then the above code will need something to create values with leading spaces.

 

Occasional Contributor
Posts: 6

Re: How to define a format for a character value differentiating by its length?

Thank you!

Super User
Posts: 6,527

Re: How to define a format for a character value differentiating by its length?

[ Edited ]

Before you go too far down this path, you should be aware of a pitfall.  As character strings, the range "000" - "999" includes many values that are not three digits.  All of these values are included in that range:

 

5 4

3AD

68

1234 (perhaps not applicable, if your variable is defined as character length 3)

3.1

 

You may want to rethink the plan here.

 

Occasional Contributor
Posts: 6

Re: How to define a format for a character value differentiating by its length?

Posted in reply to Astounding

Yes, that is the problem. Is there another way to identify the observations that have 3 digits and those who do not have 3 digits?

Super User
Posts: 6,527

Re: How to define a format for a character value differentiating by its length?

In a DATA step:

 

if length(var) = 3 and (100 <= input(var, ??3.) <= 999);

 

A WHERE statement becomes tricky since it does not permit the "??" to suppress messages about invalid data.  It generates a warning when you remove the "??" but WHERE is legal in a PROC step if you can live with the warning.

Occasional Contributor
Posts: 6

Re: How to define a format for a character value differentiating by its length?

Posted in reply to Astounding

This also works, thank you

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 149 views
  • 2 likes
  • 4 in conversation