Solved: Importing text file with comma delimiters <challenge>

nicholas_leung_gmail_com · Posted 10-23-2013 10:37 PM

Hi guys,

1) I am using SAS EG5.1

2) My data looks something like this:

11,1374,C5,37,24 to

,286330.962000,10. >=$23,000 <$51,000

,6553.3000,11.

The problem is that I want to capture the ">=$23,000 <$51,000" as one variable, rather than giving me ">=$23" "000 <$51" "000" in three different columns.

Further compounded when some of the sections are like >=$5<$15 which is the problem.

I thought of concatenating them, but it doesnt fix the problem of data being out of alignment.

Any thoughts guys?

Thanks!

PGStats · Posted 10-23-2013 10:56 PM

Maybe you could simply replace all occurrences of ",000" with "000" with a text editor before reading the file. Would that do the trick? - PG

PG

View solution in original post

JerryLeBreton · Posted 10-23-2013 10:50 PM

A CSV file which has commas embedded in values will typically have those values within double quotes.

Can the CSV file be created so that it includes the double quotes?

nicholas_leung_gmail_com · Posted 10-23-2013 10:54 PM

Thanks Jerry, I don't have access to the raw data unfortunately - I'm working with extracts and they're extracting in text files than CSV.

Not sure if they can extract the data as CSV -> the entire dataset is allgedly 50GB.

PGStats · Posted 10-23-2013 10:56 PM

Maybe you could simply replace all occurrences of ",000" with "000" with a text editor before reading the file. Would that do the trick? - PG

PG

nicholas_leung_gmail_com · Posted 10-23-2013 11:30 PM

Good idea - do you have any suggestions for what to use?

Point and click interfaces are helpful

I tried using LTF to no success (couldn't replace text).

No clue how to work VIM out.

Downloading SlickEdit as I type.

Peter_L · Posted 10-24-2013 07:17 AM

I would use sed for this. It is the unix stream editor, also available for Windows. A command line like this will do what you want:

sed -e 's/,000/000/g' infile.csv >outfile.csv

My version is 37 kB. It will work efficiently with very large files.

An alternative would be to read the whole line in with SAS and use the perl regular expression functions to parse the line. The regular expression syntax is similar to that of sed.

Another possibility would be awk (or perl) to process the file into something SAS can read easily. I find these tools much easier to use on delimited text files than SAS.

PGStats · Posted 10-24-2013 12:53 PM

You could also do everything in SAS, as suggested by Peter.L , in a fairly robust fashion :

data test;

length v1-v9 $24;

infile datalines dlm=',';

input @;

_infile_ = prxchange("s/(\$\d+),(\d{3})/$1$2/o", -1, _infile_);

input v1-v9;

datalines;

11,1374,C5,37,24 to,286330.962000,10. >=$23,000 <$51,000 ,6553.3000,11

;

PG

art297 · Posted 10-24-2013 12:45 AM

I think that your problem isn't just not solved but, even worse, that you may not know what you are trying to import.

What do those values represent? The 3rd value in your example is a character field, the 5th one looks like an incomplete range (i.e., 24 to .), and the 7th one (which you said you want to be">=$23,000 <$51,000" doesn't explain why it is preceded with 10. and not separated with a comma.

Do you know what variables all of those values represent and which ones might be ranges and how the end user wants each dealt with?

Importing text file with comma delimiters <challenge>

Re: Importing text file with comma delimiters <challenge>

Re: Importing text file with comma delimiters <challenge>

Re: Importing text file with comma delimiters <challenge>

Re: Importing text file with comma delimiters <challenge>

Re: Importing text file with comma delimiters <challenge>

Re: Importing text file with comma delimiters <challenge>

Re: Importing text file with comma delimiters <challenge>

Re: Importing text file with comma delimiters <challenge>

SAS Innovate 2025: Call for Content