BookmarkSubscribeRSS Feed
ksj
Fluorite | Level 6 ksj
Fluorite | Level 6

Hi All, 

 

I have this requirement to read data from a SAS dataset and write to an external file on Linux. 

The non-trivial part of this is - if a field has leading or trailing blanks, it should be retained in the output text file exactly with the same length. There are automated test cases checking inputs (with leading and trailing blanks) against output. 

 

The default behaviour of datastep put statement is to remove leading and trailing blanks. I tried with $varying and $char formats. They retain the leading blanks but are expanding the trailing blanks to fill the defined length of the field in dataset.

 

Eg: If the dataset field length is 12, and the value has 4 alphanum characters and 3 trailing blanks, I am expecting 7 characters to be written to the file instead of 12 characters (with 4 alphanum + 3 blanks + 5 unwanted blanks). 

 

Here is the sample code - can you pl help what is wrong with the below? 

 

data t;
format a $12. b $12. c $12.;
a = '   lead';
b = 'trai   ';
c = '   both   ';
run;

 

filename outf "/work/test.DAT" encoding="utf-8" termstr=crlf recfm=v;
data _null_;
file outf delimiter='|';
set t;
/*default put statement - not preserving leading and trailing blanks*/
put a b c;
/*1: varying format preserves leading blanks but not trailing blanks*/

la1 = lengthn(a);
lb1 = lengthn(b);
lc1 = lengthn(c);

put a $varying. la1 '|' @;
put b $varying. lb1 '|' @;
put c $varying. lc1 '|' ;

/*2: count trailing balnks - preserves leading blanks but adds too many trailing blanks*/

la2 = notspace(reverse(a))-1;
lb2 = notspace(reverse(b))-1;
lc2 = notspace(reverse(c))-1;

la = lengthn(a)+notspace(reverse(a))-1;
lb = lengthn(b)+notspace(reverse(b))-1;
lc = lengthn(c)+notspace(reverse(c))-1;

put a $varying. la '|' @;
put b $varying. lb '|' @;
put c $varying. lc '|' ;

/*3: same result as 2 */

put a $char. '|' @;
put b $char. '|' @;
put c $char. '|' ;

run;

 

12 REPLIES 12
ballardw
Super User

If the data set was already created it has likely "lost" any information about trailing blanks with the way that SAS reads character values.

How did you read the data that maintained leading and trailing blanks?

 

This code:

la = lengthn(a)+notspace(reverse(a))-1;

is basically returning the defined length of the variable because the reverse pads the result to the defined length of the base variable.

 

consider

data work.junk;
   length v $ 15;
   v='a';
   rv= reverse(v);
run;

RV has 14 leading spaces.

 

Perhaps instead of putting variables you build a longer string variable and put that?

This creates a variable that preserves the lengths when concatenated.

data t;
format a $12. b $12. c $12.;
a = quote('   lead');
b = quote('trai   ');
c = quote('   both   ');
run;
data junk;

set t;
line= catx('|',a,b,c,b);
line= compress(line,'"');

run;

Put will still suppress the leading spaces of the first variable. I added the b variable in there twice to show where the | goes with a following value.

 

 

Would ASCII null characters be a problem? If you replace spaces with the null character, ASCII 255, the Length function counts those

 

 

 

Generally when I have seen this sort of requirement the data is not actually delimited when written to a file but uses fixed columns.

ksj
Fluorite | Level 6 ksj
Fluorite | Level 6

@ballardw, Thank you for the reply..

As you said, it looks like the trailing blanks information is already "lost" when SAS reads character values. 

 

I tried with ASCII null & quote/unquote methods. All gave the same result with padded trailing blanks. 

 

a1=translate(a,'FF'x,' ');
b1=translate(b,'FF'x,' ');
c1=translate(c,'FF'x,' ');

put a1 $char. '|' @;
put b1 $char. '|' @;
put c1 $char. '|' ;



z1=catx('|','*',a,b,c,'*');

a2=quote(a);
b2=quote(b);
c2=quote(c);
z2=compress(catx('|','*',a2,b2,c2,'*'),'"');

put z2=;

result: 

 

 

ÿÿÿleadÿÿÿÿÿ|traiÿÿÿÿÿÿÿÿ|ÿÿÿbothÿÿÿÿÿ|
z2=*|   lead     |trai        |   both     |*

This type of loss of fidelity is a major concern for us at the moment. My task is to check this on a 20 GB dataset with around 200 fields. Should this be reported to SAS tech support ? Or are there any other ways to prevent this from happening ? 

PS: I forgot to mention - I am using SAS9.4 

 

ChrisNZ
Tourmaline | Level 20

Eg: If the dataset field length is 12, and the value has 4 alphanum characters and 3 trailing blanks, I am expecting 7 characters to be written to the file instead of 12 characters (with 4 alphanum + 3 blanks + 5 unwanted blanks). 

 

If the field has a length of 12, you'll write out either 4 or 12 characters. What's the difference between the 3 blanks and the 5 unwanted blanks? They are just blanks.

Unless they have a different code (hex 20 is the default for space) or you have another field giving the length to use.

ksj
Fluorite | Level 6 ksj
Fluorite | Level 6

@ChrisN, the 3 trailing blanks were present in the original source (Oracle table) that feeds the SAS dataset. The expectation is that SAS would retain the 3 trailing blanks when stored in SAS dataset and after consumption, it writes only 3 trailing blanks to an external file. Values from the external file is used to reconcile with the original source values by a different (non-SAS) application. 

 

the 5 unwanted blanks are not present in the original source but added by SAS. 

 

ChrisNZ
Tourmaline | Level 20

1. The expectation is that SAS would retain the 3 trailing blanks

SAS only has fixed-length character variables. That's the only character type there is and this cannot be changed.

The only way to remove spaces is to shorten the variable.

 

2. Values from the external file is used to reconcile

Trailing blanks *usually* don't prevent reconciliation.

ksj
Fluorite | Level 6 ksj
Fluorite | Level 6

@ChrisNZ, Thank you, I am now beginning to appreciate the limitations of SAS datastep. As mentioned earlier, reconciliation happens in a non-SAS application that has different definitions of accuracy. If this is how SAS behaves, I will have to negotiate the terms of reconciliation with the non-SAS application. 

SASKiwi
PROC Star

@ksj - I think it would help if you explained what your complete process does. You got SAS querying Oracle writing out a text file, then a reconciliation process - from reading the text file? Reconciling what with what?

Tom
Super User Tom
Super User

@ksj wrote:

@ChrisN, the 3 trailing blanks were present in the original source (Oracle table) that feeds the SAS dataset. The expectation is that SAS would retain the 3 trailing blanks when stored in SAS dataset and after consumption, it writes only 3 trailing blanks to an external file. Values from the external file is used to reconcile with the original source values by a different (non-SAS) application. 

 

the 5 unwanted blanks are not present in the original source but added by SAS. 

 


If you need to know the length of the value in the Oracle table then ask Oracle to tell you.

proc sql;
connect .... ;
create table want as select * from connection to oracle
(select var1,length(var1) as var1_length,....
  from oracle.table
);

You can then use that value with the $VARYING. format.

put ... var1 $varying200. var1_length  ... ;

Once you get it working track down whoever inserted trailing blanks into a variable length field in Oracle and give them 40 lashes with a wet noodle.

Tom
Super User Tom
Super User

Is the source a text file or the Oracle table?

If the source is a text file that show an example of the text file to see how to read it and calculate the source lengths.  Note that you will need to store that information somehow.

 

You can either create a separate variable with the length (or perhaps number of trailing blanks). 

length var1 $200;
var1='fred  ';
var1_length=6;
...
put '|' var1 $varying200. var1_length '|';

Or add one to the length of the variable and add an extra non-blank character to mark the end of every value.

length var1 $201;
var1='fred  '||'|';
...
var1_length=length(var1)-1;
put '|' var1 $varying200. var1_length '|';
data_null__
Jade | Level 19
184  data _null_;
185     set t;
186     put 'NOTE: LIST put leading and trailing spaces "lost"' / a b c;
187     put 'NOTE: FORMATTED put using $w. ' / (a b c) ($12.);
188     put 'NOTE: FORMATTED put using $char. ' / (a b c) ($char12.);
189     run;

NOTE: LIST put leading and trailing spaces "lost"
lead trai both
NOTE: FORMATTED put using $w.
   lead     trai           both
NOTE: FORMATTED put using $char.
   lead     trai           both
NOTE: There were 1 observations read from the data set WORK.T.

 

 

323  data abc;
324     length type $8;
325     length a b c $12;
326     retain a b c '123456789012';
327     output;
328     type='LIST';     input @1 a b c @;                output;
329     type='FMT WD';   input @1 (a b c) ($12.) @;       output;
330     type='FMT CHAR'; input @1 (a b c) ($char12.) @;   output;
331     l=12;
332     type='FMT VARY'; input @1 (a b c) ($varying. l);  output;
333     list;
334     stop;
335     cards;

RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
336           lead     trai           both

Obs    type        a               b               c                l

 1                 123456789012    123456789012    123456789012     .
 2     LIST        lead            trai            both             .
 3     FMT WD      lead            trai            both             .
 4     FMT CHAR       lead         trai               both          .
 5     FMT VARY       lead         trai               both         12

 

 

Ksharp
Super User
Not quite follow your question.


data t;
format a $12. b $12. c $12.;
a = '   lead';
b = 'trai   ';
c = '   both   ';
run;

data _null_;
set t;
list=catt(a,'|',b,'|',c);
put list char100.;
run;

Ksharp
Super User

Or $varying.

 

data t;
format a $12. b $12. c $12.;
a = '   lead';
b = 'trai   ';
c = '   both   ';
run;

data _null_;
set t;
list=catt(a,'|',b,'|',c);
len=length(list);
put list $varying100. len;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 8752 views
  • 2 likes
  • 7 in conversation