Calcite | Level 5

## Compress function

Hello

I am trying to run below code

data new1;

length x \$50.;

x='Sorslev Stole As New';

z=COMPRESS(x,'000102030405060708090A0B0C0D0E0F10111213141516171819201A1B1C1D1E1F'X)

;

run;

When I see the output it looks like

x=Sorslev Stole As New

z=SorslevStoleAsNew

My question is why the spaces are removed in output variable Z

what does the compress remove the spaces when we have not mentioned

Thanks!

8 REPLIES 8
Amethyst | Level 16

## Re: Compress function

I believe you are trying to compress the numbers in the string then used 'ka' keep alphabets option and the ' ' will avoid the compressing of space.

``````data new1;
length x \$50.;
x='Sorslev Stole As New';
z=COMPRESS(x,' ','ka')
;
run;``````
Thanks,
Jag
Calcite | Level 5

## Re: Compress function

I know i can use this

But My question is what does  this line 00102030405060708090A0B0C0D0E0F10111213141516171819201A1B1C1D1E1F'X do in compress function

COMPRESS(x,'000102030405060708090A0B0C0D0E0F10111213141516171819201A1B1C1D1E1F'X)

## Re: Compress function

Hi @shubham1,

That expression ('000102...1F'X) is a character constant expressed in hexadecimal notation. It contains 66 hexadecimal digits (0-9,A-F), hence represents 33 (=66/2) characters: the characters with (decimal) ASCII codes 0, 1, 2, ..., 24, 25, 32, 26, 27, 28, 29, 30, 31. Most of these are non-printable characters, not available on the keyboard, which is why using hexadecimal notation makes sense. ASCII code 32 (hex 20) is the space character, so it is among the characters to be deleted by the COMPRESS function, which explains the result in variable Z.

Edit: Here is another example using the hexadecimal representations '41'x and '42'x of 'A' and 'B' (and again '20'x for the space ' '):

``````data _null_;
c='PALM BEACH';
d=compress(c,'4142'x); /* equivalent: d=compress(c,'AB'); */
put d=;
e=compress(c,'414220'x);   /* equiv.: e=compress(c,'AB '); */
put e=;
run;``````

Result:

```d=PLM ECH
e=PLMECH```
Calcite | Level 5

## Re: Compress function

You are correct .But I have a question

I am doing a migration project .This code when running on mainframe SAS host it is not removing spaces there

But  when I am running same code in windows it is removing spaces as you explained

what is the reason for this

## Re: Compress function

The mainframe uses EBCDIC encoding, not ASCII (which is used under Windows). According to the table in the linked Wikipedia article the EBCDIC code of the space character is (hex) 40 as opposed to 20 in ASCII. So, on the mainframe you need to compress '40'x to remove blanks, not '20'x.

Super User

## Re: Compress function

@FreelanceReinh wrote:

The mainframe uses EBCDIC encoding, not ASCII (which is used under Windows). According to the table in the linked Wikipedia article the EBCDIC code of the space character is (hex) 40 as opposed to 20 in ASCII. So, on the mainframe you need to compress '40'x to remove blanks, not '20'x.

Right, Except that I believe SAS stores character data in ASCII in the dataset even on the Mainframe. (It has been about 25 years since I used SAS on IBM Mainframe.)

## Re: Compress function

@Tom wrote:

Right, Except that I believe SAS stores character data in ASCII in the dataset even on the Mainframe. (It has been about 25 years since I used SAS on IBM Mainframe.)

Interesting, thanks for chiming in. I hoped the encoding difference would explain the OP's observation.

I have never used SAS on mainframes. My only encounters with EBCDIC data were data exchanges with customers involving awkward tape cartridges, about 20 years ago.

Super User

## Re: Compress function

Because you included the code for space '20'x in your list. It is not in the lexicographical order  since you have it between '19'x and '1A'x instead of after '1F'x.

If you just want to get a list of characters in order use the COLLATE() function.

``z=COMPRESS(x,collate(0,31));``
Discussion stats
• 8 replies
• 2270 views
• 1 like
• 4 in conversation