BookmarkSubscribeRSS Feed
Modsavant
Calcite | Level 5

Beginner here...

I have a CSV file that requires 2 rounds of parsing, and I cannot figure out how to do it:

1) parse out rows that are all in one cell, currently separated by an Alt-Enter, into their own rows (and cells);

2) parse out comma-separated-values in these new rows into their own columns.

Please refer to the attached JPGs of file as a guide to illustrate what I am trying to accomplish. There are 2 worksheets: “Incoming data” shows what I receive (in CSV format), and “Goal” shows where I need to end up (preferably in Excel format):


1. In “Incoming data” worksheet, column E (“Comments”) contains data that is on multiple lines separated by Alt-Enter, but within the same cell. Data in each row within the cell is separated by commas, always in the following order:

  1. GA123456 = invoice number (this alphanumeric value always starts with a GA, QR, QC, or AC. The number of numeric digits after the 2 letters is always 6, and they are always numbers)
  2. 7 = dispute reason (number can be anywhere between 1 and 25)
  3. $71.40 = disputed amount (sometimes with dollar sign, sometimes without. Value can be anywhere from $0.01 to $200000. Value will never be separated by a comma for $thousands)

2. Column E needs to be parsed out so that each row within cell E2 is in its own cell (and therefore row), and the data in columns A-D and F-I are duplicated to the new rows.

3. Column E then needs to be further parsed out so that each comma separated value is in its own cell, under its new column heading. See worksheet “Goal” in attached file to illustrate my final desired goal.

Whether step 2 comes before 3 or vice versa, I don’t care. As long as the data ends up as per “Goal”, I am happy.

Additional comments on the CSV data file:

1. The CSV file will be ultimately be used in MS Excel 2010 for further processing.

2. The CSV file will be up to 5000 rows long, but the number of columns will stay the same.

3. In the ‘Incoming Data’ Comments column, there could be up to 15 separate rows within one cell, each row separated by an Alt-Enter. This example only shows three.

4. The “Disputed Amount” will undoubtedly be sometimes entered with a $ sign, and sometimes without. I need the macro to work either way.

5. The parsed data as per the “Goal” worksheet should be in Excel format, if possible.

Many thanks to any and all who choose to assist!

Modsavant


Goal.jpgIncoming data.jpg
14 REPLIES 14
data_null__
Jade | Level 19

what is your OS and SAS version.

Post example of CSV file as attachment.

Modsavant
Calcite | Level 5

Oh... I assumed when I entered that info into my SAS Profile, it would be automatically visible to others.

OS: Windows 7 Enterprise, 32 bit

SAS: EG 6.1 M1

The system didn't let me upload CSV or Excel, so I posted some screenshots of my data in my original post.

Also, I no longer see an option to attach files at this point.

Reeza
Super User

Use the advanced editor (upper right hand light text of Reply) to attach sample data.

CSV doesn't have "cells", so a sample is required to fully understand what you're talking about.

Ksharp
Super User

That is because your data contains some line carriage character( i.e.  '0D0A'x ).

And better use HEX. check what exactly these special character are .

data have;
id=1; comment=catx('0D0A'x,'saas','sdfsd','xcxd','vdfew');output;
id=2; comment=catx('0D0A'x,'wwe','ghf','bvd','azx');output;
run;
data want;
 set have;
 do i=1 to countw(comment,'0D0A'x);
  com=scan(comment,i,'0D0A'x);output;
 end;
 drop i comment;
run;

Xia Keshan

Message was edited by: xia keshan

Suzy_Cat
Pyrite | Level 9

@Ksharp Thank you for your input here. It really helped and resolved my issue with data export to CSV.

 

Here is my code which works perfectly to my problem:

data have;

id=1; comment=catx('0D0A'x,'sa as','sdf   sd','xcxd','v dfew');output;

id=2; comment=catx('0D0A'x,'w we','ghf','bv d','azx');output;

run;

 

data want;

set have;

* Count number of lines in a cell;

Cnt=countw(comment,'0D0A'x);

* Remove line breaks;

comment_new=compress(comment,'0D0A'X);

* Remove extra space and connect text in different lines by semicolon;

comment_new1=compbl(translate(comment,';','0D0A'X));

run;

 

 

Capture.PNG

Tom
Super User Tom
Super User

You don't appear to be using TRANSLATE() properly.  Did you intend to convert the carriage returns to semi-colons and the linefeeds to spaces?  That is what your code is doing.  If so it would be better to clearly spell that out in the code by explicitly including the space.

translate(comment,'; ','0D0A'X)

If not then perhaps you wanted the TRANWRD() or TRANSTRN() function instead.

tranwrd(comment,'0D0A'X,';')

 

Suzy_Cat
Pyrite | Level 9
Thank you @Tom, the code works perfectly with what I want to achieve.
Your suggestion help with alternative ways to get the same result.
Appreciated for your time:)
Kurt_Bremser
Super User

Just made a quick check (Win7, Excel 2010).

A spreadsheet with Alt-Enter linebreaks in the cells will, when saved as .csv, have hex 0A as linebreaks in the cells and hex 0D0A as record separator.

Unfortunately, both (0A and 0D0A) are interpreted by SAS to be record separators, so you may need to do some cleaning before reading into SAS, or try to read the file raw:

use recfm=f

read into a buffer variable until you detect a 0D0A sequence

split into fields using the delimiter of the csv

then split the multiple-line cell using 0A as delimiter

output for each item in the multiple-line cell

delete everything from the buffer up to and including the 0D0A

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not sure if it helps, but I had some code from Support for dealing with 0D0A characters: https://communities.sas.com/thread/60374

Kurt_Bremser
Super User

Forget my previous posting.

Use

infile "....." termstr=CRLF dlm=';' dsd;

and read the fields. Then split the multiple line cell along the 0A's with the scan function.

Caution: Will not work if file is transferred to say, UNIX, as ASCII and the CRLF's are replaced with LF's.

Modsavant
Calcite | Level 5

I appreciate everyone's input, but it is clear to me that I need more help here.

I don't even know HOW to implement your suggestions. I was expecting answers that sounded something like "Open the Query Builder and choose such and such an option..."

When I read:

Use

infile "....." termstr=CRLF dlm=';' dsd;

...it tells me that I am in over my head. How would I even start "using" this type of code?

I use SAS for its easy visual method of building queries. I don't code anything. I may just try to figure out my problem directly in Excel.

Reeza
Super User

If you post sample data someone can help with code. A picture of the data isn't helpful, to provide code someone would have to generate fake data, which in this case is a bit of work, so instead you get ideas of the solution.

My solution would be a standard import and then parse it out with the scan function because it looks like your data follows a specific structure and you know you have a max amount (15) so it would relatively straight forward.

If you choose to do it in Excel/manually look at the Data>Text to Columns feature to separate out the fields.

Good Luck either way Smiley Happy

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, sorry.  Just assumed you were a SAS programmer, didn't read the Data Management discussion group.  If that is the case, I would recommend passing it to your programmers to sort out.  Whilst you could do things in Excel, it definately wouldn't be my recommendation.  For instance, how would you validate it?  not sure of your position, but if that data is used in any kind of important step and there is problems...  Your programmers (I would hope) have processes for this.

art297
Opal | Level 21

From your comments, I take it that you have both the Excel xlsx file and a copy saved as a CSV file.

If your site licenses the product that allows you to import Excel files you could just open a program window, paste the code below, and run it. The only two things you would have to change is where I used datafile="c:\art\test1.xlsx" to be the location and name of your excel workbook and, where I indicated sheet="Sheet1"; to indicate the actual name of the worksheet you have:

proc import datafile="c:\art\test1.xlsx" out=thedata replace dbms=excel;

  sheet="Sheet1";

  usedate=yes;

run;

data want (drop=comments i);

  set thedata;

  i=1;

  do while (scan(comments,i,'0A'x) ne "");

    Invoice_Number=scan(scan(comments,i,'0A'x),1,',');

    Dispute_Reason=input(scan(scan(comments,i,'0A'x),2,','),8.);

    Disputed_Amount=input(scan(scan(comments,i,'0A'x),3,','),comma12.);

    output;

    i+1;

  end;

run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 10206 views
  • 3 likes
  • 9 in conversation