New SAS User

afiesnugraha · Posted 02-26-2020 11:54 PM

Hi, I am just wondering about informat. From SAS documentation, it stated that informat statement declared how a raw data should be read. In my case, i specified the format and informat of some variable to 21.10 and 21.2. Then i export the table to excel file and also copy the table to an SQL server table. In excel file, the value still retain all of the decimal. While in SQL table, the values are right. When i sum the values between the SQL table and SAS table, there is gap between the values. So is the informat didnt change the raw value to 2 or 10 decimals?

Kurt_Bremser · Posted 02-27-2020 01:24 AM

What do you mean by "the values are right"?

Keep in mind that both Excel and SAS store numbers in 8 bytes floating point, leading to a maximum precision of 15-16 decimal digits. What you see will also be determined by the display formats applied, both in SAS and Excel. Please supply an example of your input data to illustrate your issue. Use the {i} button to post text that must not be reformatted by the forum software.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

afiesnugraha · Posted 02-27-2020 02:20 AM

Thank you for your reply @Kurt_Bremser ,

The input has varying decimals,

Amount
5691329.442412
3213422.9
141215788276.73

As this value is needed for report, we specified the format and informat to 21.2.

After we state the format and informat, the display in SAS will be

Amount
5691329.44
3213422.90
141215788276.73

But if i export the table to excel using EG, the value will be the same as input

Amount
5691329.442412
3213422.9
141215788276.73

We also create a Microsoft SQL table for history or backup table using Data Integration. And if i export the Backup table to excel using EG, the value in excel will be

Amount
5691329.44
3213422.90
141215788276.73

On top of that, i sum the amount of source SAS table and backup SQL table, there is small gap about 0.2 between them.

Kurt_Bremser · Posted 02-27-2020 02:30 AM

For varying numbers of decimals, use the BEST format:

data have;
input Amount;
format amount best22.;
datalines;
5691329.442412
3213422.9
141215788276.73
;

proc print data=have noobs;
run;

result:

         Amount

 5691329.442412
      3213422.9
141215788276.73

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

afiesnugraha · Posted 02-27-2020 02:49 AM

I would love to use BEST32 as format, but the user want the format to be 21.2 in the report. And creating this backup table is the last step of current flow. So report creation is done before this step.

Patrick · Posted 02-27-2020 03:36 AM

@afiesnugraha wrote:

I would love to use BEST32 as format, but the user want the format to be 21.2 in the report. And creating this backup table is the last step of current flow. So report creation is done before this step.

@afiesnugraha

You're touching on multiple topics here. Let my try to give it a go.

Formats & Informats

Informats are only of relevance when reading textual data (like from a .txt or .csv). They don't get into play when reading from anything else like a database table or an Excel file.

For informats in the form W.D (or bestW.d) specifying a value for d is almost always wrong. If you specify the d component - like in 21.2 - then what you're actually doing is to instruct SAS to interpret the last d digits of a source string as the decimals in case the source string doesn't contain a decimal delimiter.

Below sample code illustrates what I'm talking about for variables myvar1 and myvar_1.

data demo;
  infile datalines truncover;
  input 
    @1 source_string :$10.
    @1 myvar1 :21.2
    @1 myvar2 :21.
    @1 myvar3 :best32.;
  myvar_1=myvar1;
  myvar_2=myvar2;
  myvar_3=myvar3;
  format myvar1 - myvar3 21.2;
  format myvar_1 - myvar_3 best32.;
  datalines;
1
1.
1.1
1.01
1.001
1.0001
;

proc print;
run;

FORMATS instruct SAS how to PRINT values but they don't change the internal value. So whatever format that's applied to a variable when writing the data to another target (SAS table, Excel, database table) the internal value doesn't get changed and will be written with full precision independent on what you see/what gets printed using a format.

...and now there will also be similar things to SAS formats on the target side (like Excel or a SQL Server table) so also here don't expect that what you see is always the actual internal value. ...but it's the internal value which gets used for calculations and comparisons.

Difference of 0.2

That's a nasty one and likely due to precision and differences in how floating point data gets stored in different systems. If I'm right then the only thing you can do about this if you really need the exact same numbers and you know for example that you only need 2 significant decimals, is to round or truncate the numbers on the target table side.

Here some reading as a starting point. If you Google for SAS numeric precision then you'll find more content. I remember a white paper which also explains stuff in detail when interfacing with a database but can't find this article right now.

afiesnugraha · Posted 02-27-2020 04:33 AM

Hi @Patrick thanks for the reply, your answer enlighten me about how informat used.

I want to ask more question if possible, We also write table to text file, but we are using data step

data _null_;

set LIB.Table;
attrib Date length=8
  format = date9.;
attrib ID length=$30
  format = $30;
attrib amount length=8
  format = 21.2

file "......../textfile.txt";

put
Date
ID
amount
;
run;

(1)If informat didnt change the internal value, the value printed got rounded in this step?(i always thought the moment i changed the informat, the internal value will changed, your explanation really saved me).

(2)And if i use "send to" excel file option in EG what statement it used? and will it print the internal value or formatted value?.

(3)When i copy the table to SQL table, will the internal value or formatted value that got printed to SQL table?

Thank you.

Patrick · Posted 02-27-2020 05:59 AM

When writing a value to an external text file SAS will use the format applied to the variable to write the value as a string to the text file.

Also be aware that numerical formats are rounding the values. So if you've got an internal value of 5.115 with a format of 21.2 applied the value "printed/written" will be 5.12

Here the code to demonstrate this behaviour.

data demo;
  infile datalines truncover;
  input 
    @1 source_string :$10.
    @1 myvar1 :21.2
    @1 myvar2 :21.
    @1 myvar3 :best32.;
  myvar_1=myvar1;
  myvar_2=myvar2;
  myvar_3=myvar3;
  format myvar1 - myvar3 21.2;
  format myvar_1 - myvar_3 best32.;
  datalines;
1
1.
1.1
1.01
1.001
5.115
;

data _null_;
  set demo;
  file print;
  put  myvar1 - myvar3 myvar_1 - myvar_3;
run;

Because this is now a text file there isn't any information anymore about the internal value. It's just a string and all information about the internal value is "lost".

When writing to an Excel SAS will also create Excel formats. SAS will write the internal value to Excel plus a Excel format which matches the format in SAS as close as possible.

So here when exporting ds demo you get an Excel which looks like:

If you look at the Excel format of the highlighted cell you'll find:

So here you don't see the internal value but the formatted value in the Excel cell.

If you change the formatting to General you'll see the internal value:

If you want to write an internal value of 5.12 to Excel then you need to change the internal values on the SAS side prior to writing the data to Excel (using the round() function). ...or you write the data to a .csv (text file) where you "loose" the internal value and then open the .csv with Excel.

...and just as a side note:

You mentioned "Data Integration". Assuming that's DIS an Informat only every really matters for External File Metadata used with the File Reader Transformation.

...and what I did just now:

1. Save the excel using the SAS EG 8.2 share/send to Microsoft Excel wizard.

2. Open the XLSX with 7-zip and look into sheet1.xml

What you see here is the precision issue and that the internal values are not 1.1 or 5.115. That's why you're getting the differences when summing up values in SAS, Excel or SQL Server.

      <c r="E6" s="3">
        <v>1.0009999999999999</v>
      </c>
      <c r="F6" s="3">
        <v>1.0009999999999999</v>
      </c>
      <c r="G6" s="3">
        <v>1.0009999999999999</v>
      </c>
    </row>
    <row r="7" spans="1:7" x14ac:dyDescent="0.25">
      <c r="A7" s="1" t="s">
        <v>12</v>
      </c>
      <c r="B7" s="2">
        <v>5.1150000000000002</v>
      </c>
      <c r="C7" s="2">
        <v>5.1150000000000002</v>
      </c>
      <c r="D7" s="3">
        <v>5.1150000000000002</v>
      </c>
      <c r="E7" s="3">
        <v>5.1150000000000002</v>
      </c>
      <c r="F7" s="3">
        <v>5.1150000000000002</v>
      </c>
      <c r="G7" s="3">
        <v>5.1150000000000002</v>
      </c>

Patrick · Posted 02-27-2020 06:47 AM

And I've just now done another test.

It looks to me that the XLSX engine "fixes" the precision issue and that the values in the Excel differ depending on whether you use SAS code or the EG wizard. Interesting! ...and something I haven't been aware of.

Not 100% sure how things behave when loading into a database but it's certainly something you want to test for when implementing something production worthy.

libname out xlsx '~/test.xlsx';
data out.sheet1;
  infile datalines truncover;
  input 
    @1 source_string :$10.
    @1 myvar1 :best32.
    ;
    myvar2=myvar1;
    format myvar1 best32.;
    format myvar2 21.2;
  datalines;
1.1
5.123
;
libname out clear;

The generated Sheet1 xml looks like below with the "correct" internal values.

    <row r="2" spans="1:3" x14ac:dyDescent="0.25">
      <c r="A2" t="s">
        <v>2</v>
      </c>
      <c r="B2">
        <v>1.1</v>
      </c>
      <c r="C2">
        <v>1.1</v>
      </c>
    </row>
    <row r="3" spans="1:3" x14ac:dyDescent="0.25">
      <c r="A3" t="s">
        <v>4</v>
      </c>
      <c r="B3">
        <v>5.123</v>
      </c>
      <c r="C3">
        <v>5.123</v>
      </c>
    </row>

afiesnugraha · Posted 02-27-2020 11:06 AM

Thank you for the elaborate explanation @Patrick , from your explanation, it seems better to use SAS code to export table to excel file when you have format to be conformed.

Hi @Tom , The database is Microsoft SQL Server. The table didnt exist in the remote database. As you said, i just register the table using SAS to remote database. So when SAS created the table, it will used 21.2 as the format of SQL table. when the data got loaded to the table, the value will got truncated.

Tom · Posted 02-27-2020 11:13 AM

You can use the DBTYPE dataset option to force SAS to use a specific database type when creating the variable (field/column) in the remote database.
https://documentation.sas.com/?docsetId=acreldb&docsetTarget=p1dqaq7ub1cm1pn1ow3x9qkmkxe8.htm&docset...

Tom · Posted 02-27-2020 09:32 AM

Your data step code is a little confused. You cannot change the TYPE of a variable after the data step has already defined it, which in your code will happen when it looks at the input dataset LIB.TABLE. You also cannot change the LENGTH of character variables once they are defined. You can change the LENGTH of numeric variables, but that just changes the length used to store the data in the output dataset(s), which your step is not generating. Note there is no need to change the format attached to a variable if the goal is just to change the format used by the PUT statement, you can include the format in the PUT statement. Or you could just use the FORMAT statement to change the format attached for this data step, instead of using separate ATTRIB statement for each variable.

I do believe the Enterprise Guide point and click tools for generating Excel files is a different process than normal SAS ways to generate those files. The EG tools will typically generate some SAS code to create a text file, download the text file, and then run some Windows code on the PC where EG is running to generate the Excel file. To use the normal SAS process first run the SAS code and then use the Copy Files task to move the generated file, if needed. There are multiple ways in SAS to generate an XLSX file. To me the easiest is the XLSX libname engine. So using your LIB.TABLE example:

filename out xlsx '....directory.../filename.xlsx';
data out.sheetname;
  set lib.table;
run;

How are you copying the file to your "SQL" table? What database is your "SQL"? Is it Microsoft SQL Server? Oracle? Teradata? Snowflake? Hadoop? Something else?

Does the table already exist in the remote database? Or are you having SAS create the table? SAS only has two data types, fixed length character strings and floating point numbers, and your remote database typically has many other types. So if you let SAS automatically define the table it use the formats you have attached to the variables to make an educated guess at how to define the variables in the remote database. If you are loading into an existing table it will try to conform the data to match the target variables. So for example with a DATE variable SAS will translate the raw number of days that SAS has in a floating point number into the value your remote database uses for that same date. Many remote databases only have DATETIME type that they use for DATE and DATETIME values and SAS will typically know how to convert for you.

Kurt_Bremser · Posted 02-27-2020 04:09 AM

Use proc datasets to change the format before you run the export step.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

New SAS User

usage of informat

Re: usage of informat

Re: usage of informat

Re: usage of informat

Re: usage of informat

Re: usage of informat

Re: usage of informat

Re: usage of informat

Re: usage of informat

Re: usage of informat

Re: usage of informat

Re: usage of informat

Re: usage of informat

Follow Us

What is...

New SAS User

Our biggest data and AI event of the year.

Follow Us

What is...