BookmarkSubscribeRSS Feed
_phintaC_
Calcite | Level 5

Hi,

I have searched far and wide and haven't been able to find an answer to the following: 

 

I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:

 

Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g. 

 

TEST   | RESULT 

TEST1 | UNKNOWN

TEST2 | 10.411

TEST3 | 1

TEST1 | 10.1 

TEST2 | 10.334

TEST3 | 11 

 

For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g. 

best format for

TEST1 = character with length 7 

TEST2 = float with length 6.3

TEST3 = integer 

?

 

4 REPLIES 4
Reeza
Super User

Only thing that jumps to mind:

 

  1. Transpose so each Test was in it's own column
  2. Write the data to a text file
  3. Use PROC IMPORT to import the data and let it guess at the types. However, it depends on your rules. If you have a column with 99% of numbers and 1% of characters it will get read as characters (depending on how many rows it uses to guess)

I guess my question is why?

 

 

 


@_phintaC_ wrote:

Hi,

I have searched far and wide and haven't been able to find an answer to the following: 

 

I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:

 

Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g. 

 

TEST   | RESULT 

TEST1 | UNKNOWN

TEST2 | 10.411

TEST3 | 1

TEST1 | 10.1 

TEST2 | 10.334

TEST3 | 11 

 

For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g. 

best format for

TEST1 = character with length 7 

TEST2 = float with length 6.3

TEST3 = integer 

?

 


 

ballardw
Super User

Character variables basically have one format $. So no such thing a "integer" "float" or similar.

 

The bit that you have some values that appear to want to display in a more numeric method means that perhaps you have the wrong variable type. You appear to want a numeric value for most of the values that has the option of displaying text "Unknown". That would be a custom format where instead of a value of "Unknown" you have a missing value, possibly a special missing (.U perhaps) and a BEST for the remaining.

 

A custom informat can read text into numeric values such as:

proc format;
invalue result (upcase)
'UNKNOWN'=.U
;

value result
.U='Unknown'
other=[best8.]
;
run;
data example;
  input result :result.;
datalines;
UNKNOWN
10.411
1
10.1 
10.334
11 
;

proc print data=example;
  format result result.;
run;

The special missing values .A to .Z and ._ can represent different reasons that data is missing and a format can provide a description as desired. I have included an example of an Informat that reads your values into numeric and a matching format. The Other=[best8.] means any value other than the .U will display in the best manner to fit within 8 characters.

When no value is explicitly provided in the Invalue statement it is assumed to be the default for the variable type, so treats the numeric values as if read with a 12. informat.

 


@_phintaC_ wrote:

Hi,

I have searched far and wide and haven't been able to find an answer to the following: 

 

I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:

 

Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g. 

 

TEST   | RESULT 

TEST1 | UNKNOWN

TEST2 | 10.411

TEST3 | 1

TEST1 | 10.1 

TEST2 | 10.334

TEST3 | 11 

 

For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g. 

best format for

TEST1 = character with length 7 

TEST2 = float with length 6.3

TEST3 = integer 

?

 


 

 

Tom
Super User Tom
Super User

That is essentially the exercise I went through in creating this macro:

https://github.com/sasutils/macros/blob/master/csv2ds.sas

 

So using the idea of converting your tall file into a wide file.

data have;
  infile cards dlm='|';
  id = ceil(_n_/3);
  input name :$32. value :$200.;
cards;
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1 
TEST2 | 10.334
TEST3 | 11 
;

proc transpose data=have out=wide(drop=_name_);
  by id;
  id name;
  var value;
run;

filename csv temp;
data _null_ ;
  file csv dsd ;
  set have;
  where id=1;
  put name @;
run;

data _null_;
  set wide (drop=id);
  file csv dsd mod ;
  put (_all_) (+0);
run;

%csv2ds(csv,out=test,replace=1);

proc print data=_types_;
run;
                                                    m   m
                                        i           i   a                                          d                       a
                                        n           n   x                      n   n       i       a                   e   n
    v                              l    f   f       l   l                      o   u       n       t       y   m   d   8   y
    a               x              e    o   o   l   e   e                      n   m   c   t       e       y   m   d   6   d
    r     n         t        t     n    r   r   a   n   n                      m   e   o   e   d   t   t   m   d   m   0   t
O   n     a         y        y     g    m   m   b   g   g     m         m      i   r   m   g   a   i   i   m   d   m   1   d
b   u     m         p        p     t    a   a   e   t   t     i         a      s   i   m   e   t   m   m   d   y   y   d   t
s   m     e         e        e     h    t   t   l   h   h     n         x      s   c   a   r   e   e   e   d   y   y   z   m

1   1   TEST1   character   char   $7               4   7   10.1     UNKNOWN   2   1   1   0   0   0   0   0   0   0   0   0
2   2   TEST2   numeric     num    8                6   6   10.334   10.411    2   2   2   0   0   0   0   0   0   0   0   2
3   3   TEST3   integer     num    8                1   2   1        11        2   2   2   2   0   0   0   0   0   0   0   0

Code generated to read the CSV file.

data test;
  infile CSV dlm=',' dsd truncover firstobs=2 ;
  length TEST1 $7 TEST2 8 TEST3 8 ;
  input TEST1 -- TEST3 ;
run;
s_lassen
Meteorite | Level 14

Given data like this

data have;                        
  infile cards dlm='|';           
  input name :$32. value :$200.;  
cards;                            
TEST1 | UNKNOWN                   
TEST2 | 10.411                    
TEST3 | 1                         
TEST1 | 10.1                      
TEST2 | 10.334                    
TEST3 | 11                        
;run;

You can check the values you want like this:

 

First scan for length and decimals:

data test;                                   
  set have;                                  
  prxid=prxparse('/^\d+\.?(\d*)$/');         
  if prxmatch(prxid,trim(value)) then do;    
    length=length(trim(value));              
    decimals=lengthn(prxposn(prxid,1,value)); 
    end;                                     
  keep name length decimals;                 
run;

The PRX expression looks for a string consisting of one or more digits, a possible period, and then zero or more digits after that (and nothing else). I used the TRIM function to guard against leading blanks.

 

Now you just have to find the maximum values:

proc summary data=test nway;       
  class name;                      
  var length decimals;             
  output out=want (drop=_:) max=;  
run;

 Edit note: I changed the second LENGTH function to LENGTHN, so that zero decimals will not be reported as one.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 778 views
  • 2 likes
  • 5 in conversation