BookmarkSubscribeRSS Feed
_phintaC_
Calcite | Level 5

Hi,

I have searched far and wide and haven't been able to find an answer to the following: 

 

I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:

 

Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g. 

 

TEST   | RESULT 

TEST1 | UNKNOWN

TEST2 | 10.411

TEST3 | 1

TEST1 | 10.1 

TEST2 | 10.334

TEST3 | 11 

 

For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g. 

best format for

TEST1 = character with length 7 

TEST2 = float with length 6.3

TEST3 = integer 

?

 

4 REPLIES 4
Reeza
Super User

Only thing that jumps to mind:

 

  1. Transpose so each Test was in it's own column
  2. Write the data to a text file
  3. Use PROC IMPORT to import the data and let it guess at the types. However, it depends on your rules. If you have a column with 99% of numbers and 1% of characters it will get read as characters (depending on how many rows it uses to guess)

I guess my question is why?

 

 

 


@_phintaC_ wrote:

Hi,

I have searched far and wide and haven't been able to find an answer to the following: 

 

I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:

 

Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g. 

 

TEST   | RESULT 

TEST1 | UNKNOWN

TEST2 | 10.411

TEST3 | 1

TEST1 | 10.1 

TEST2 | 10.334

TEST3 | 11 

 

For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g. 

best format for

TEST1 = character with length 7 

TEST2 = float with length 6.3

TEST3 = integer 

?

 


 

ballardw
Super User

Character variables basically have one format $. So no such thing a "integer" "float" or similar.

 

The bit that you have some values that appear to want to display in a more numeric method means that perhaps you have the wrong variable type. You appear to want a numeric value for most of the values that has the option of displaying text "Unknown". That would be a custom format where instead of a value of "Unknown" you have a missing value, possibly a special missing (.U perhaps) and a BEST for the remaining.

 

A custom informat can read text into numeric values such as:

proc format;
invalue result (upcase)
'UNKNOWN'=.U
;

value result
.U='Unknown'
other=[best8.]
;
run;
data example;
  input result :result.;
datalines;
UNKNOWN
10.411
1
10.1 
10.334
11 
;

proc print data=example;
  format result result.;
run;

The special missing values .A to .Z and ._ can represent different reasons that data is missing and a format can provide a description as desired. I have included an example of an Informat that reads your values into numeric and a matching format. The Other=[best8.] means any value other than the .U will display in the best manner to fit within 8 characters.

When no value is explicitly provided in the Invalue statement it is assumed to be the default for the variable type, so treats the numeric values as if read with a 12. informat.

 


@_phintaC_ wrote:

Hi,

I have searched far and wide and haven't been able to find an answer to the following: 

 

I am curious on whether SAS has the capability to report directly, what format or informat is best suited for a set of values within some categories. For instance:

 

Let's say we have a categorical variable TEST that contains a number of values, e.g. "TEST1" "TEST2" "TEST3". Furthermore, we have a character variable, RESULT, that contains a number of associated values, that can either be character or numeric (float or integer), e.g. 

 

TEST   | RESULT 

TEST1 | UNKNOWN

TEST2 | 10.411

TEST3 | 1

TEST1 | 10.1 

TEST2 | 10.334

TEST3 | 11 

 

For the TESTs I would like to find the best format matching the values in RESULT, is there a way to report that, so that e.g. 

best format for

TEST1 = character with length 7 

TEST2 = float with length 6.3

TEST3 = integer 

?

 


 

 

Tom
Super User Tom
Super User

That is essentially the exercise I went through in creating this macro:

https://github.com/sasutils/macros/blob/master/csv2ds.sas

 

So using the idea of converting your tall file into a wide file.

data have;
  infile cards dlm='|';
  id = ceil(_n_/3);
  input name :$32. value :$200.;
cards;
TEST1 | UNKNOWN
TEST2 | 10.411
TEST3 | 1
TEST1 | 10.1 
TEST2 | 10.334
TEST3 | 11 
;

proc transpose data=have out=wide(drop=_name_);
  by id;
  id name;
  var value;
run;

filename csv temp;
data _null_ ;
  file csv dsd ;
  set have;
  where id=1;
  put name @;
run;

data _null_;
  set wide (drop=id);
  file csv dsd mod ;
  put (_all_) (+0);
run;

%csv2ds(csv,out=test,replace=1);

proc print data=_types_;
run;
                                                    m   m
                                        i           i   a                                          d                       a
                                        n           n   x                      n   n       i       a                   e   n
    v                              l    f   f       l   l                      o   u       n       t       y   m   d   8   y
    a               x              e    o   o   l   e   e                      n   m   c   t       e       y   m   d   6   d
    r     n         t        t     n    r   r   a   n   n                      m   e   o   e   d   t   t   m   d   m   0   t
O   n     a         y        y     g    m   m   b   g   g     m         m      i   r   m   g   a   i   i   m   d   m   1   d
b   u     m         p        p     t    a   a   e   t   t     i         a      s   i   m   e   t   m   m   d   y   y   d   t
s   m     e         e        e     h    t   t   l   h   h     n         x      s   c   a   r   e   e   e   d   y   y   z   m

1   1   TEST1   character   char   $7               4   7   10.1     UNKNOWN   2   1   1   0   0   0   0   0   0   0   0   0
2   2   TEST2   numeric     num    8                6   6   10.334   10.411    2   2   2   0   0   0   0   0   0   0   0   2
3   3   TEST3   integer     num    8                1   2   1        11        2   2   2   2   0   0   0   0   0   0   0   0

Code generated to read the CSV file.

data test;
  infile CSV dlm=',' dsd truncover firstobs=2 ;
  length TEST1 $7 TEST2 8 TEST3 8 ;
  input TEST1 -- TEST3 ;
run;
s_lassen
Meteorite | Level 14

Given data like this

data have;                        
  infile cards dlm='|';           
  input name :$32. value :$200.;  
cards;                            
TEST1 | UNKNOWN                   
TEST2 | 10.411                    
TEST3 | 1                         
TEST1 | 10.1                      
TEST2 | 10.334                    
TEST3 | 11                        
;run;

You can check the values you want like this:

 

First scan for length and decimals:

data test;                                   
  set have;                                  
  prxid=prxparse('/^\d+\.?(\d*)$/');         
  if prxmatch(prxid,trim(value)) then do;    
    length=length(trim(value));              
    decimals=lengthn(prxposn(prxid,1,value)); 
    end;                                     
  keep name length decimals;                 
run;

The PRX expression looks for a string consisting of one or more digits, a possible period, and then zero or more digits after that (and nothing else). I used the TRIM function to guard against leading blanks.

 

Now you just have to find the maximum values:

proc summary data=test nway;       
  class name;                      
  var length decimals;             
  output out=want (drop=_:) max=;  
run;

 Edit note: I changed the second LENGTH function to LENGTHN, so that zero decimals will not be reported as one.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 656 views
  • 2 likes
  • 5 in conversation