SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Parsing CSV data with Alt-Enter separators within one cell

Reply
New Contributor
Posts: 3

Parsing CSV data with Alt-Enter separators within one cell

Beginner here...

I have a CSV file that requires 2 rounds of parsing, and I cannot figure out how to do it:

1) parse out rows that are all in one cell, currently separated by an Alt-Enter, into their own rows (and cells);

2) parse out comma-separated-values in these new rows into their own columns.

Please refer to the attached JPGs of file as a guide to illustrate what I am trying to accomplish. There are 2 worksheets: “Incoming data” shows what I receive (in CSV format), and “Goal” shows where I need to end up (preferably in Excel format):


1. In “Incoming data” worksheet, column E (“Comments”) contains data that is on multiple lines separated by Alt-Enter, but within the same cell. Data in each row within the cell is separated by commas, always in the following order:

  1. GA123456 = invoice number (this alphanumeric value always starts with a GA, QR, QC, or AC. The number of numeric digits after the 2 letters is always 6, and they are always numbers)
  2. 7 = dispute reason (number can be anywhere between 1 and 25)
  3. $71.40 = disputed amount (sometimes with dollar sign, sometimes without. Value can be anywhere from $0.01 to $200000. Value will never be separated by a comma for $thousands)

2. Column E needs to be parsed out so that each row within cell E2 is in its own cell (and therefore row), and the data in columns A-D and F-I are duplicated to the new rows.

3. Column E then needs to be further parsed out so that each comma separated value is in its own cell, under its new column heading. See worksheet “Goal” in attached file to illustrate my final desired goal.

Whether step 2 comes before 3 or vice versa, I don’t care. As long as the data ends up as per “Goal”, I am happy.

Additional comments on the CSV data file:

1. The CSV file will be ultimately be used in MS Excel 2010 for further processing.

2. The CSV file will be up to 5000 rows long, but the number of columns will stay the same.

3. In the ‘Incoming Data’ Comments column, there could be up to 15 separate rows within one cell, each row separated by an Alt-Enter. This example only shows three.

4. The “Disputed Amount” will undoubtedly be sometimes entered with a $ sign, and sometimes without. I need the macro to work either way.

5. The parsed data as per the “Goal” worksheet should be in Excel format, if possible.

Many thanks to any and all who choose to assist!

Modsavant


Goal.jpgIncoming data.jpg
Respected Advisor
Posts: 3,777

Re: Parsing CSV data with Alt-Enter separators within one cell

what is your OS and SAS version.

Post example of CSV file as attachment.

New Contributor
Posts: 3

Re: Parsing CSV data with Alt-Enter separators within one cell

Oh... I assumed when I entered that info into my SAS Profile, it would be automatically visible to others.

OS: Windows 7 Enterprise, 32 bit

SAS: EG 6.1 M1

The system didn't let me upload CSV or Excel, so I posted some screenshots of my data in my original post.

Also, I no longer see an option to attach files at this point.

Super User
Posts: 17,749

Re: Parsing CSV data with Alt-Enter separators within one cell

Use the advanced editor (upper right hand light text of Reply) to attach sample data.

CSV doesn't have "cells", so a sample is required to fully understand what you're talking about.

Super User
Posts: 9,662

Re: Parsing CSV data with Alt-Enter separators within one cell

That is because your data contains some line carriage character( i.e.  '0D0A'x ).

And better use HEX. check what exactly these special character are .

data have;
id=1; comment=catx('0D0A'x,'saas','sdfsd','xcxd','vdfew');output;
id=2; comment=catx('0D0A'x,'wwe','ghf','bvd','azx');output;
run;
data want;
 set have;
 do i=1 to countw(comment,'0D0A'x);
  com=scan(comment,i,'0D0A'x);output;
 end;
 drop i comment;
run;

Xia Keshan

Message was edited by: xia keshan

Super User
Posts: 6,928

Re: Parsing CSV data with Alt-Enter separators within one cell

Just made a quick check (Win7, Excel 2010).

A spreadsheet with Alt-Enter linebreaks in the cells will, when saved as .csv, have hex 0A as linebreaks in the cells and hex 0D0A as record separator.

Unfortunately, both (0A and 0D0A) are interpreted by SAS to be record separators, so you may need to do some cleaning before reading into SAS, or try to read the file raw:

use recfm=f

read into a buffer variable until you detect a 0D0A sequence

split into fields using the delimiter of the csv

then split the multiple-line cell using 0A as delimiter

output for each item in the multiple-line cell

delete everything from the buffer up to and including the 0D0A

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,392

Re: Parsing CSV data with Alt-Enter separators within one cell

Not sure if it helps, but I had some code from Support for dealing with 0D0A characters: https://communities.sas.com/thread/60374

Super User
Posts: 6,928

Re: Parsing CSV data with Alt-Enter separators within one cell

Forget my previous posting.

Use

infile "....." termstr=CRLF dlm=';' dsd;

and read the fields. Then split the multiple line cell along the 0A's with the scan function.

Caution: Will not work if file is transferred to say, UNIX, as ASCII and the CRLF's are replaced with LF's.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
New Contributor
Posts: 3

Re: Parsing CSV data with Alt-Enter separators within one cell

I appreciate everyone's input, but it is clear to me that I need more help here.

I don't even know HOW to implement your suggestions. I was expecting answers that sounded something like "Open the Query Builder and choose such and such an option..."

When I read:

Use

infile "....." termstr=CRLF dlm=';' dsd;

...it tells me that I am in over my head. How would I even start "using" this type of code?

I use SAS for its easy visual method of building queries. I don't code anything. I may just try to figure out my problem directly in Excel.

Super User
Posts: 17,749

Re: Parsing CSV data with Alt-Enter separators within one cell

If you post sample data someone can help with code. A picture of the data isn't helpful, to provide code someone would have to generate fake data, which in this case is a bit of work, so instead you get ideas of the solution.

My solution would be a standard import and then parse it out with the scan function because it looks like your data follows a specific structure and you know you have a max amount (15) so it would relatively straight forward.

If you choose to do it in Excel/manually look at the Data>Text to Columns feature to separate out the fields.

Good Luck either way Smiley Happy

Super User
Super User
Posts: 7,392

Re: Parsing CSV data with Alt-Enter separators within one cell

Ah, sorry.  Just assumed you were a SAS programmer, didn't read the Data Management discussion group.  If that is the case, I would recommend passing it to your programmers to sort out.  Whilst you could do things in Excel, it definately wouldn't be my recommendation.  For instance, how would you validate it?  not sure of your position, but if that data is used in any kind of important step and there is problems...  Your programmers (I would hope) have processes for this.

PROC Star
Posts: 7,356

Re: Parsing CSV data with Alt-Enter separators within one cell

From your comments, I take it that you have both the Excel xlsx file and a copy saved as a CSV file.

If your site licenses the product that allows you to import Excel files you could just open a program window, paste the code below, and run it. The only two things you would have to change is where I used datafile="c:\art\test1.xlsx" to be the location and name of your excel workbook and, where I indicated sheet="Sheet1"; to indicate the actual name of the worksheet you have:

proc import datafile="c:\art\test1.xlsx" out=thedata replace dbms=excel;

  sheet="Sheet1";

  usedate=yes;

run;

data want (drop=comments i);

  set thedata;

  i=1;

  do while (scan(comments,i,'0A'x) ne "");

    Invoice_Number=scan(scan(comments,i,'0A'x),1,',');

    Dispute_Reason=input(scan(scan(comments,i,'0A'x),2,','),8.);

    Disputed_Amount=input(scan(scan(comments,i,'0A'x),3,','),comma12.);

    output;

    i+1;

  end;

run;

Ask a Question
Discussion stats
  • 11 replies
  • 2246 views
  • 1 like
  • 7 in conversation