Hi All,
I have this requirement to read data from a SAS dataset and write to an external file on Linux.
The non-trivial part of this is - if a field has leading or trailing blanks, it should be retained in the output text file exactly with the same length. There are automated test cases checking inputs (with leading and trailing blanks) against output.
The default behaviour of datastep put statement is to remove leading and trailing blanks. I tried with $varying and $char formats. They retain the leading blanks but are expanding the trailing blanks to fill the defined length of the field in dataset.
Eg: If the dataset field length is 12, and the value has 4 alphanum characters and 3 trailing blanks, I am expecting 7 characters to be written to the file instead of 12 characters (with 4 alphanum + 3 blanks + 5 unwanted blanks).
Here is the sample code - can you pl help what is wrong with the below?
data t;
format a $12. b $12. c $12.;
a = ' lead';
b = 'trai ';
c = ' both ';
run;
filename outf "/work/test.DAT" encoding="utf-8" termstr=crlf recfm=v;
data _null_;
file outf delimiter='|';
set t;
/*default put statement - not preserving leading and trailing blanks*/
put a b c;
/*1: varying format preserves leading blanks but not trailing blanks*/
la1 = lengthn(a);
lb1 = lengthn(b);
lc1 = lengthn(c);
put a $varying. la1 '|' @;
put b $varying. lb1 '|' @;
put c $varying. lc1 '|' ;
/*2: count trailing balnks - preserves leading blanks but adds too many trailing blanks*/
la2 = notspace(reverse(a))-1;
lb2 = notspace(reverse(b))-1;
lc2 = notspace(reverse(c))-1;
la = lengthn(a)+notspace(reverse(a))-1;
lb = lengthn(b)+notspace(reverse(b))-1;
lc = lengthn(c)+notspace(reverse(c))-1;
put a $varying. la '|' @;
put b $varying. lb '|' @;
put c $varying. lc '|' ;
/*3: same result as 2 */
put a $char. '|' @;
put b $char. '|' @;
put c $char. '|' ;
run;
If the data set was already created it has likely "lost" any information about trailing blanks with the way that SAS reads character values.
How did you read the data that maintained leading and trailing blanks?
This code:
la = lengthn(a)+notspace(reverse(a))-1;
is basically returning the defined length of the variable because the reverse pads the result to the defined length of the base variable.
consider
data work.junk; length v $ 15; v='a'; rv= reverse(v); run;
RV has 14 leading spaces.
Perhaps instead of putting variables you build a longer string variable and put that?
This creates a variable that preserves the lengths when concatenated.
data t; format a $12. b $12. c $12.; a = quote(' lead'); b = quote('trai '); c = quote(' both '); run; data junk; set t; line= catx('|',a,b,c,b); line= compress(line,'"'); run;
Put will still suppress the leading spaces of the first variable. I added the b variable in there twice to show where the | goes with a following value.
Would ASCII null characters be a problem? If you replace spaces with the null character, ASCII 255, the Length function counts those
Generally when I have seen this sort of requirement the data is not actually delimited when written to a file but uses fixed columns.
@ballardw, Thank you for the reply..
As you said, it looks like the trailing blanks information is already "lost" when SAS reads character values.
I tried with ASCII null & quote/unquote methods. All gave the same result with padded trailing blanks.
a1=translate(a,'FF'x,' ');
b1=translate(b,'FF'x,' ');
c1=translate(c,'FF'x,' ');
put a1 $char. '|' @;
put b1 $char. '|' @;
put c1 $char. '|' ;
z1=catx('|','*',a,b,c,'*');
a2=quote(a);
b2=quote(b);
c2=quote(c);
z2=compress(catx('|','*',a2,b2,c2,'*'),'"');
put z2=;
result:
ÿÿÿleadÿÿÿÿÿ|traiÿÿÿÿÿÿÿÿ|ÿÿÿbothÿÿÿÿÿ| z2=*| lead |trai | both |*
This type of loss of fidelity is a major concern for us at the moment. My task is to check this on a 20 GB dataset with around 200 fields. Should this be reported to SAS tech support ? Or are there any other ways to prevent this from happening ?
PS: I forgot to mention - I am using SAS9.4
Eg: If the dataset field length is 12, and the value has 4 alphanum characters and 3 trailing blanks, I am expecting 7 characters to be written to the file instead of 12 characters (with 4 alphanum + 3 blanks + 5 unwanted blanks).
If the field has a length of 12, you'll write out either 4 or 12 characters. What's the difference between the 3 blanks and the 5 unwanted blanks? They are just blanks.
Unless they have a different code (hex 20 is the default for space) or you have another field giving the length to use.
@ChrisN, the 3 trailing blanks were present in the original source (Oracle table) that feeds the SAS dataset. The expectation is that SAS would retain the 3 trailing blanks when stored in SAS dataset and after consumption, it writes only 3 trailing blanks to an external file. Values from the external file is used to reconcile with the original source values by a different (non-SAS) application.
the 5 unwanted blanks are not present in the original source but added by SAS.
1. The expectation is that SAS would retain the 3 trailing blanks
SAS only has fixed-length character variables. That's the only character type there is and this cannot be changed.
The only way to remove spaces is to shorten the variable.
2. Values from the external file is used to reconcile
Trailing blanks *usually* don't prevent reconciliation.
@ChrisNZ, Thank you, I am now beginning to appreciate the limitations of SAS datastep. As mentioned earlier, reconciliation happens in a non-SAS application that has different definitions of accuracy. If this is how SAS behaves, I will have to negotiate the terms of reconciliation with the non-SAS application.
@ksj - I think it would help if you explained what your complete process does. You got SAS querying Oracle writing out a text file, then a reconciliation process - from reading the text file? Reconciling what with what?
@ksj wrote:
@ChrisN, the 3 trailing blanks were present in the original source (Oracle table) that feeds the SAS dataset. The expectation is that SAS would retain the 3 trailing blanks when stored in SAS dataset and after consumption, it writes only 3 trailing blanks to an external file. Values from the external file is used to reconcile with the original source values by a different (non-SAS) application.
the 5 unwanted blanks are not present in the original source but added by SAS.
If you need to know the length of the value in the Oracle table then ask Oracle to tell you.
proc sql;
connect .... ;
create table want as select * from connection to oracle
(select var1,length(var1) as var1_length,....
from oracle.table
);
You can then use that value with the $VARYING. format.
put ... var1 $varying200. var1_length ... ;
Once you get it working track down whoever inserted trailing blanks into a variable length field in Oracle and give them 40 lashes with a wet noodle.
Is the source a text file or the Oracle table?
If the source is a text file that show an example of the text file to see how to read it and calculate the source lengths. Note that you will need to store that information somehow.
You can either create a separate variable with the length (or perhaps number of trailing blanks).
length var1 $200;
var1='fred ';
var1_length=6;
...
put '|' var1 $varying200. var1_length '|';
Or add one to the length of the variable and add an extra non-blank character to mark the end of every value.
length var1 $201;
var1='fred '||'|';
...
var1_length=length(var1)-1;
put '|' var1 $varying200. var1_length '|';
184 data _null_;
185 set t;
186 put 'NOTE: LIST put leading and trailing spaces "lost"' / a b c;
187 put 'NOTE: FORMATTED put using $w. ' / (a b c) ($12.);
188 put 'NOTE: FORMATTED put using $char. ' / (a b c) ($char12.);
189 run;
NOTE: LIST put leading and trailing spaces "lost"
lead trai both
NOTE: FORMATTED put using $w.
lead trai both
NOTE: FORMATTED put using $char.
lead trai both
NOTE: There were 1 observations read from the data set WORK.T.
323 data abc;
324 length type $8;
325 length a b c $12;
326 retain a b c '123456789012';
327 output;
328 type='LIST'; input @1 a b c @; output;
329 type='FMT WD'; input @1 (a b c) ($12.) @; output;
330 type='FMT CHAR'; input @1 (a b c) ($char12.) @; output;
331 l=12;
332 type='FMT VARY'; input @1 (a b c) ($varying. l); output;
333 list;
334 stop;
335 cards;
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
336 lead trai both
Obs type a b c l
1 123456789012 123456789012 123456789012 .
2 LIST lead trai both .
3 FMT WD lead trai both .
4 FMT CHAR lead trai both .
5 FMT VARY lead trai both 12
Not quite follow your question. data t; format a $12. b $12. c $12.; a = ' lead'; b = 'trai '; c = ' both '; run; data _null_; set t; list=catt(a,'|',b,'|',c); put list char100.; run;
Or $varying.
data t;
format a $12. b $12. c $12.;
a = ' lead';
b = 'trai ';
c = ' both ';
run;
data _null_;
set t;
list=catt(a,'|',b,'|',c);
len=length(list);
put list $varying100. len;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.