SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Removing Multi-Line Breaks (ALT+ENTER) in Variables from Imported .xlsx File

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Removing Multi-Line Breaks (ALT+ENTER) in Variables from Imported .xlsx File

Howdy folks! Long time lurker; first time poster - let me know if I gave you enough info below on this issue.

 

I've read 41 Excel files into SAS (some xls, some xlsx), reformatted them, concatenated them into one SAS datafile, and am now trying to recode some of the variables. Alas, it seems that some variables were read in as multi-line data (that is, someone used Alt+Enter in Excel when entering data). So, "YES (RECEIPT)" and "YES (RECEIPT)" look exactly the same, but when I run this code:

 

 

DATA MERGED.MERGETEST;
SET MERGED.MERGECLN;
 IF Q_HASSSACARD_HFH = 'YES (RECEIPT)' THEN RE_Q_HASSSACARD_HFH = 'YES';
RUN;

It only reformats some of the "YES (RECEIPT)" values, and leaves two of them untouched. I went back to the original Excel files and confirmed that these 2 leftover values were in fact "YES (ALT+ENTER) (RECEIPT)" values.

 

I've tried:

 

STRIPping the variable

DATA MERGED.MERGECLN;
SET MERGED.MERGECLN;
Q_HASSSACARD_HFH = STRIP(Q_HASSSACARD_HFH);
RUN;

and


COMPRESSing the variable

DATA MERGED.MERGECLN;
SET MERGED.MERGECLN;
Q_HASSSACARD_HFH = COMPRESS(Q_HASSSACARD_HFH);
RUN;

 

...to no avail. I still have those leftover "YES (RECEIPT)" variable values. What am I missing here?

 

Thank you so much for taking a look.


Accepted Solutions
Solution
‎09-07-2017 03:27 PM
Super User
Posts: 19,080

Re: Removing Multi-Line Breaks (ALT+ENTER) in Variables from Imported .xlsx File

[ Edited ]

Try this, note the extra comma since we're not using the second parameter for the COMPRESS function, but we do need to space it so SAS knows which options we are using. 

 

Q_HASSSACARD_HFH = COMPRESS(Q_HASSSACARD_HFH, , 's');

Example 3 in the docs Smiley Wink

http://support.sas.com/documentation/cdl/en/lefunctionsref/69762/HTML/default/viewer.htm#n0fcshr0ir3...

View solution in original post


All Replies
Super User
Posts: 19,080

Re: Removing Multi-Line Breaks (ALT+ENTER) in Variables from Imported .xlsx File

Look at the COMPRESS function again, but with the modifer to remove all blank character types. I think it's either i or s, but check the documentation.

Occasional Contributor
Posts: 8

Re: Removing Multi-Line Breaks (ALT+ENTER) in Variables from Imported .xlsx File

Hi Reeza, thanks so much for your input! Here's what I see in the COMPRESS documentation:

 

a or A

adds alphabetic characters to the list of characters.

c or C

adds control characters to the list of characters.

d or D

adds digits to the list of characters.

f or F

adds the underscore character and English letters to the list of characters.

g or G

adds graphic characters to the list of characters.

h or H

adds a horizontal tab to the list of characters.

i or I

ignores the case of the characters to be kept or removed.

k or K

keeps the characters in the list instead of removing them.

l or L

adds lowercase letters to the list of characters.

n or N

adds digits, the underscore character, and English letters to the list of characters.

o or O

processes the second and third arguments once rather than every time the COMPRESS function is called. Using the O modifier in the DATA step (excluding WHERE clauses), or in the SQL procedure, can make COMPRESS run much faster when you call it in a loop where the second and third arguments do not change.

p or P

adds punctuation marks to the list of characters.

s or S

adds space characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed) to the list of characters.

t or T

trims trailing blanks from the first and second arguments.

u or U

adds uppercase letters to the list of characters.

w or W

adds printable characters to the list of characters.

x or X

adds hexadecimal characters to the list of characters.

 

So would I do something like this?

 

DATA MERGED.MERGECLN;
SET MERGED.MERGECLN;
Q_HASSSACARD_HFH = COMPRESS(Q_HASSSACARD_HFH,H);
RUN;

Or is there more to it? I'm a total newbie to the COMPRESS function.

Solution
‎09-07-2017 03:27 PM
Super User
Posts: 19,080

Re: Removing Multi-Line Breaks (ALT+ENTER) in Variables from Imported .xlsx File

[ Edited ]

Try this, note the extra comma since we're not using the second parameter for the COMPRESS function, but we do need to space it so SAS knows which options we are using. 

 

Q_HASSSACARD_HFH = COMPRESS(Q_HASSSACARD_HFH, , 's');

Example 3 in the docs Smiley Wink

http://support.sas.com/documentation/cdl/en/lefunctionsref/69762/HTML/default/viewer.htm#n0fcshr0ir3...

Occasional Contributor
Posts: 8

Re: Removing Multi-Line Breaks (ALT+ENTER) in Variables from Imported .xlsx File

You're wonderful! Thank you! I'll slow down and read more carefully next time.

Super User
Super User
Posts: 6,843

Re: Removing Multi-Line Breaks (ALT+ENTER) in Variables from Imported .xlsx File

Unless your dataset already had a variable named H that call should have generated these notes in the SAS log.

NOTE: Numeric values have been converted to character values at the places given by:
NOTE: Variable H is uninitialized.

Because you told COMPRESS() to use the value of H as the list of characters to remove.

The modifiers are the third parameter and if you want to use a constant value then you need to use a string literal and not a variable name.

COMPRESS(Q_HASSSACARD_HFH,,'S');

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 229 views
  • 0 likes
  • 3 in conversation