BookmarkSubscribeRSS Feed
G_I_Jeff
Obsidian | Level 7

Creating data step to input file dump. All variables are straight forward except for one containing space information. Examples of the data are:

121.440K

50956.360K

1439.280M

 

The filed is 11 bytes long with a size identifier as the last character (k=kilobytes,m=megabytes,etc.) I need to convert these values to a common denominator (numeric field representing bytes) and remove the size identifier character. My attempt so far, with what I know (SPACE variable being the one I'm referencing in my post):

DATA FDREPORT;                    
 INFILE INFILE;                   
 INPUT @1  VOLSER   $6.           
       @8  DSNAME   $44.          
       @53 ARCYEAR  4.            
       @58 ARCJUL   3.            
       @62 SPACE    $11.          
       @74 CATALOG  $3.           
       @78 EXPYEAR  4.            
       @83 EXPJUL   3.            
       @87 RUNYEAR  4.            
       @92 RUNJUL   3.;           
 IF _N_ = 1 THEN DELETE;          
 IF INDEXC(SPACE,'K') THEN DO;    
    SPACE=TRANSLATE(SPACE,'','K');
    NUMBER=INPUT(SPACE,7.3)*1024; 
 END;                             
RUN;                              

 This "seems" to be working from what I've run so far. So my questions are:

  1. Is there a better way of doing this? I realize I will have to create multiple if statements for the different size characters (k,m,g,t,p). When I run a proc contents on the pdb, it lists the NUMBER field as TYPE NUM with a length of 8?
  2. The last two lines of the input are summary lines that I need to skip. Is there an easy way of accomplishing this?

 

6 REPLIES 6
ballardw
Super User

No suggestions on how to "better" read that somewhat odd numeric format.

 

By default SAS numeric variables are length 8, i.e. 8 bytes. That limits the number of significant digits that can be stored. You may want to check if the storage range needs to be considered for your application as this may be an issue with decimal portions of values.

Somewhat operating system dependent.

that can be stored in SAS numeric variables.
Significant Digits and Largest Integer by Length for SAS Variables under Windows
Length in Bytes
Largest Integer Represented Exactly
Exponential Notation
Significant Digits Retained
3
8,192
213
3
4
2,097,152
221
6
5
536,870,912
229
8
6
137,438,953,472
237
11
7
35,184,372,088,832
245
13
8
9,007,199,254,740,992
253
15
 

 

Is there any specific way from the data that your "summary" is indicated in the file? For instance if the "summary" line starts with the word "Total" you can check the input buffer with something like:

 

input @;
If _infile_ =: 'Total' then input; /* this would be the summary line*/
else do;
   input <your existing input statement>
   <other code executed for "valid" data>
   output; /*this output means only data from valid records gets output to the data set*/
end;

The input @ holds the input buffer. The _infile_ is a temporary variables SAS creates that holds the entire current input line of data. So you can examine it.

 

You may want to provide an actual example of what your last 3 or 4 rows of the infile looks like. Copy them using a plain text editor and paste the copied text into code box opened on the forum using the </> icon to preserve formatting of the text.

SteveDenham
Jade | Level 19

It would take some time for me to get this right, but could you do something like:

Read in the values as character;

Use REVERSE to get the alphabetic character at the front

Use SUBSTR to get the alphabetic character in one variable (var_1)

Use SUBSTR with no length argument to get the numeric characters in another variable (var_2)

Use REVERSE on var_2 to restore the original ordering

Use INPUT (var_2, best.) to get the mantissa into numeric format (var_3)

Define var_4 based on var_1 (k=1024, etc.

Multiply var_3 by var_4 to get var_5

Add them all up.

 

Don't know if this would also solve the issue of what is in the last two lines, but you could probably write some easy trapping code involving var_1 and var_2 to eliminate those records.

 

SteveDenham

 

 

Tom
Super User Tom
Super User

Convert the beginning part to a number and then multiply by the right power. Are the numbers using 1024 or 1000 as the units?

data have;
  input space $11.;
  size=inputn(space,cats('comma',length(space)-1,'.'));
  select (char(space,length(space)));
    when ('K') size=size*1024;
    when ('M') size=size*1024**2;
    when ('G') size=size*1024**3;
    otherwise ;
  end;
cards;
121.440K
50,956.360K
1439.280M
123.45G
1245B
;
Obs    space                     size

 1     121.440K             124354.56
 2     50,956.360K        52179312.64
 3     1439.280M        1509194465.28
 4     123.45G        132553428172.80
 5     1245B                  1245.00
SteveDenham
Jade | Level 19
This is a much better method than my flipping things back and forth to extract the alphabetic part of the input string.

SteveDenham
G_I_Jeff
Obsidian | Level 7

Tom, these are mainframe storage numbers so I'm pretty sure they will by multiplied by 1024 due to binary storage fields. Thank you for your input and I believe this will help me along nicely.... if I can interpret it! 😁

G_I_Jeff
Obsidian | Level 7

Tom,

 

How would you suggest me getting a variable with YYYY with anther variable JJJ (julian date) into one recognized date field?

 

Var1 4.,  /*YYYY*/

Var2 3.  /*JJJ*/

 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1214 views
  • 2 likes
  • 4 in conversation