Hello
I am trying to run below code
data new1;
length x $50.;
x='Sorslev Stole As New';
z=COMPRESS(x,'000102030405060708090A0B0C0D0E0F10111213141516171819201A1B1C1D1E1F'X)
;
run;
When I see the output it looks like
x=Sorslev Stole As New
z=SorslevStoleAsNew
My question is why the spaces are removed in output variable Z
what does the compress remove the spaces when we have not mentioned
Thanks!
please try the below code
I believe you are trying to compress the numbers in the string then used 'ka' keep alphabets option and the ' ' will avoid the compressing of space.
data new1;
length x $50.;
x='Sorslev Stole As New';
z=COMPRESS(x,' ','ka')
;
run;
I know i can use this
But My question is what does this line 00102030405060708090A0B0C0D0E0F10111213141516171819201A1B1C1D1E1F'X do in compress function
COMPRESS(x,'000102030405060708090A0B0C0D0E0F10111213141516171819201A1B1C1D1E1F'X)
Hi @shubham1,
That expression ('000102...1F'X) is a character constant expressed in hexadecimal notation. It contains 66 hexadecimal digits (0-9,A-F), hence represents 33 (=66/2) characters: the characters with (decimal) ASCII codes 0, 1, 2, ..., 24, 25, 32, 26, 27, 28, 29, 30, 31. Most of these are non-printable characters, not available on the keyboard, which is why using hexadecimal notation makes sense. ASCII code 32 (hex 20) is the space character, so it is among the characters to be deleted by the COMPRESS function, which explains the result in variable Z.
Edit: Here is another example using the hexadecimal representations '41'x and '42'x of 'A' and 'B' (and again '20'x for the space ' '):
data _null_;
c='PALM BEACH';
d=compress(c,'4142'x); /* equivalent: d=compress(c,'AB'); */
put d=;
e=compress(c,'414220'x); /* equiv.: e=compress(c,'AB '); */
put e=;
run;
Result:
d=PLM ECH e=PLMECH
You are correct .But I have a question
I am doing a migration project .This code when running on mainframe SAS host it is not removing spaces there
But when I am running same code in windows it is removing spaces as you explained
what is the reason for this
The mainframe uses EBCDIC encoding, not ASCII (which is used under Windows). According to the table in the linked Wikipedia article the EBCDIC code of the space character is (hex) 40 as opposed to 20 in ASCII. So, on the mainframe you need to compress '40'x to remove blanks, not '20'x.
@FreelanceReinh wrote:
The mainframe uses EBCDIC encoding, not ASCII (which is used under Windows). According to the table in the linked Wikipedia article the EBCDIC code of the space character is (hex) 40 as opposed to 20 in ASCII. So, on the mainframe you need to compress '40'x to remove blanks, not '20'x.
Right, Except that I believe SAS stores character data in ASCII in the dataset even on the Mainframe. (It has been about 25 years since I used SAS on IBM Mainframe.)
@Tom wrote:
Right, Except that I believe SAS stores character data in ASCII in the dataset even on the Mainframe. (It has been about 25 years since I used SAS on IBM Mainframe.)
Interesting, thanks for chiming in. I hoped the encoding difference would explain the OP's observation.
I have never used SAS on mainframes. My only encounters with EBCDIC data were data exchanges with customers involving awkward tape cartridges, about 20 years ago.
Because you included the code for space '20'x in your list. It is not in the lexicographical order since you have it between '19'x and '1A'x instead of after '1F'x.
If you just want to get a list of characters in order use the COLLATE() function.
z=COMPRESS(x,collate(0,31));
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.