BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
azertyuiop
Quartz | Level 8

Hello,

 

I own a Excel file with several columns to import in SAS .

One column contains a concatenation with several under part with this caracters " ; " follow by " alt 0010" like barrier. To separate severals data in the under bloc I use a " | " :


 blabla1|statut1|numeropersonne1;blabla2|statut2|numeropersonne2;blabla3|statut3|numeropersonne3

 
In Excel , a user adds " alt 0010 "  after " ; " to optimise the vision of data in the screen.
 
When the excel file is import in SAS I use a macro to cut the under part. If there is 1 " ; " I must see 2 lines , 2 " ; " I must see 3 lines ets ...
There is here a problem :
 
After this symbol " ; " there is " alt 0010 " in the starting of next line


After several checking :

 

* with rank function , rank(field,1) dive 13 for the ASCII caracters type 13.

* with the using of substr function in position 4 in the field start really here

* with a count function in the field SAS say that the "alt 0010" is place on 2 caracters ...

 

Can you help me to found a correct function to delete the "alt 0010 " in SAS ?

 

Thanks for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
azertyuiop
Quartz | Level 8

Hello,

 

After severals days to search the point of bug . This problem is fixed .

 

I explain :

 

1) The excel files are generated by an other departement of the company. To generate the data and Excel file there is this in the this code before a prox export in excel :

 

/* code sas */

proc transpose data=work.origine out=mytable(drop=_name_);
by firstkey ;
var secondkey ;
run;

data work.mytable;

set work.mytable;

length concat $7500. ;
format concat $7500. ;
informat concat $7500. ;

concat=catx('', of col:);

*drop col:;

run;

For the second key it's impossible to know the number of appearance. Due to this sitution when the proc transpose is run the number of columns is random. When can obtain 10 columns like it's possible to obtain 50 columns.

 

This line :

 

concat=catx('', of col:);

generates a "blank" or a "voide", and under Excel and SAS this it translate by a particular space.

 

To fix this bug it's necessary to write a true space follow by a function tranwrd :

 

concat=catx(' ', of col:);
concat=tranwrd(concat,'; ',';');

 

View solution in original post

10 REPLIES 10
Reeza
Super User

COMPRESS()  - and look at the modifiers. There's one for blanks or invisible characters that may do what you need. 

 


@azertyuiop wrote:

Hello,

 

I own a Excel file with several columns to import in SAS .

One column contains a concatenation with several under part with this caracters " ; " follow by " alt 0010" like barrier. To separate severals data in the under bloc I use a " | " :


 blabla1|statut1|numeropersonne1;blabla2|statut2|numeropersonne2;blabla3|statut3|numeropersonne3

 
In Excel , a user adds " alt 0010 "  after " ; " to optimise the vision of data in the screen.
 
When the excel file is import in SAS I use a macro to cut the under part. If there is 1 " ; " I must see 2 lines , 2 " ; " I must see 3 lines ets ...
There is here a problem :
 
After this symbol " ; " there is " alt 0010 " in the starting of next line


After several checking :

 

* with rank function , rank(field,1) dive 13 for the ASCII caracters type 13.

* with the using of substr function in position 4 in the field start really here

* with a count function in the field SAS say that the "alt 0010" is place on 2 caracters ...

 

Can you help me to found a correct function to delete the "alt 0010 " in SAS ?

 

Thanks for your help.


 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

"In Excel , a user adds " alt 0010 "  after " ; " to optimise the vision of data in the screen."

Sounds like that is where your problem lies.  Why do you have multiple components of data points delimited by a pipe, delimited by a semicolon.  Sounds to me like your going out of your way to make it as difficult as possible.  Excel has more than one cell, use them for separate elements.  More to the point however, why are you using Excel for this in the first place?  What is the benefit you gain from doing things this way.  What is this data your trying to get, and why can it not be in a standard format?  These are all questions I would be asking before I even bothered opening the file.  If you end up having to use Excel, then you need to put some rules on it, otherwise they could just post a picture into the cell, or a hyperlink somewhere else etc.

azertyuiop
Quartz | Level 8

Hello ,

 

You can see the format like on this picture :

 

Sans titre.png

There are 5 departements in my company which work in a even structure of excel file.

 

Each departement check severals informations in physical paper file . For each file the departement own a list of information into the field " comment_statut_idclient " to control the paper file.

 

To reduce the quantity of line each department have decided to compact and group all information about one paper file in the same cell. In one cell it's a CSV format which is group.

 

When I receive all Excel fils in SAS I must place each information about " comment_statut_idclient " , in one line with a copy of the rest of all data in the line.

 

As I 'm not the leader of the process to decide the structure of data in Excel , I must adapt my process in Sas to cut the cell " statut " in several cells when it's necessary.

 

To "help me" in the structure of data , I must locate the symbol ";" but I must to locate " alt 0010" which is just after the symbol ";" . If I don't find a solution to delete " alt 0010" , I can't tranform my data.

Reeza
Super User

In other wording these are known as line breaks. If you search on here there are multiple examples on how to remove them.

And do you have a CSV or EXCEL file. 
If you have a CSV file you can remove them before you import the data. 

 

azertyuiop
Quartz | Level 8
Hello ,

I my case I can use only an Excel file.
Tom
Super User Tom
Super User

Normally Excel will embed linefeeds into cells to simulate line breaks within a cell.  It sounds like from your description of how these are used you can just remove them. That is easy to do with the COMPRESS() function.

myvar = compress(myvar,'0A'x);

So you could just just use an ARRAY and a DO loop to do that for every character varaible.

data want;
  set have ;
  array _c _character_;
  do over _c ; _c=compress(_c,'0A'x); end;
run;

It is possible, but not likely, that they have also embedded carriage return characters, which is hex code '0D'x.

 

How are you reading the Excel file into your SAS dataset?  If you are using PROC IMPORT or the XLSX libname engine then the embedded linefeeds will not cause any trouble in the import.  If you are saving the file as a CSV file and then trying to read it with SAS then you can get it work easily IF they do NOT have embedded carriage returns AND you save the CSV using CR+LF as the end of line markers.  If you do that then SAS can read the CSV file without you having to make any modifications to it.

data want ;
  infile 'mydata.csv'  dsd truncover firstobs=2 termstr=crlf ;
  ....
run;

If there are embedded CR+LF character in one or more cells then you will need to pre-process the CSV file and remove them or replace them with something else.  There are examples of how to do that in solutions to other questions on this site.

azertyuiop
Quartz | Level 8
Hello ,

I have trying your solution with "compresse , and with array do , for Excel with '0D'x and '0A'x , it's the same result. When I see my data , there is again a "empty" caractere after the ";" .

For CSV case I keep a note for a futur case if I have this case in the futur. In my case I'm limit with a Excel format , not a csv format.
AlanC
Barite | Level 11

Ok, I 'hope' you are using Excel 2010 or later. If so, the Excel file is, in fact, zipped XML files. One option is to simply unzip the data and fix it in XML. Easy to do with a text editor , command line, SAS, etc. Then rezip it back up.

 

https://github.com/savian-net
azertyuiop
Quartz | Level 8

Hello,

 

After severals days to search the point of bug . This problem is fixed .

 

I explain :

 

1) The excel files are generated by an other departement of the company. To generate the data and Excel file there is this in the this code before a prox export in excel :

 

/* code sas */

proc transpose data=work.origine out=mytable(drop=_name_);
by firstkey ;
var secondkey ;
run;

data work.mytable;

set work.mytable;

length concat $7500. ;
format concat $7500. ;
informat concat $7500. ;

concat=catx('', of col:);

*drop col:;

run;

For the second key it's impossible to know the number of appearance. Due to this sitution when the proc transpose is run the number of columns is random. When can obtain 10 columns like it's possible to obtain 50 columns.

 

This line :

 

concat=catx('', of col:);

generates a "blank" or a "voide", and under Excel and SAS this it translate by a particular space.

 

To fix this bug it's necessary to write a true space follow by a function tranwrd :

 

concat=catx(' ', of col:);
concat=tranwrd(concat,'; ',';');

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

The solution to the "problem" is to fix the "problem.  Nothing about what you have posted in any way a good process, in fact other than sending this file by email, it would appear to use every possible method to make this a process which will break 9/10 runs.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2585 views
  • 4 likes
  • 5 in conversation