SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Importing text file with comma delimiters <challenge>

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Importing text file with comma delimiters <challenge>

Hi guys,

1) I am using SAS EG5.1

2) My data looks something like this:

11,1374,C5,37,24 to,286330.962000,10. >=$23,000 <$51,000  ,6553.3000,11.

The problem is that I want to capture the ">=$23,000 <$51,000" as one variable, rather than giving me ">=$23" "000 <$51" "000" in three different columns.

Further compounded when some of the sections are like >=$5<$15 which is the problem.

I thought of concatenating them, but it doesnt fix the problem of data being out of alignment.

Any thoughts guys?

Thanks!


Accepted Solutions
Solution
‎10-23-2013 10:56 PM
Respected Advisor
Posts: 4,930

Re: Importing text file with comma delimiters <challenge>

Posted in reply to nicholas_leung_gmail_com

Maybe you could simply replace all occurrences of  ",000" with "000" with a text editor before reading the file. Would that do the trick? - PG

PG

View solution in original post


All Replies
Frequent Contributor
Posts: 85

Re: Importing text file with comma delimiters <challenge>

Posted in reply to nicholas_leung_gmail_com

A CSV file which has commas embedded in values will typically have those values within double quotes.

Can the CSV file be created so that it includes the double quotes?

New Contributor
Posts: 4

Re: Importing text file with comma delimiters <challenge>

Posted in reply to JerryLeBreton

Thanks Jerry, I don't have access to the raw data unfortunately - I'm working with extracts and they're extracting in text files than CSV.

Not sure if they can extract the data as CSV -> the entire dataset is allgedly 50GB.

Solution
‎10-23-2013 10:56 PM
Respected Advisor
Posts: 4,930

Re: Importing text file with comma delimiters <challenge>

Posted in reply to nicholas_leung_gmail_com

Maybe you could simply replace all occurrences of  ",000" with "000" with a text editor before reading the file. Would that do the trick? - PG

PG
New Contributor
Posts: 4

Re: Importing text file with comma delimiters <challenge>

Good idea - do you have any suggestions for what to use?

Point and click interfaces are helpful

I tried using LTF to no success (couldn't replace text).

No clue how to work VIM out.


Downloading SlickEdit as I type.

Contributor
Posts: 33

Re: Importing text file with comma delimiters <challenge>

Posted in reply to nicholas_leung_gmail_com

I would use sed for this. It is the unix stream editor, also available for Windows. A command line like this will do what you want:

sed -e 's/,000/000/g' infile.csv >outfile.csv

My version is 37 kB. It will work efficiently with very large files.

An alternative would be to read the whole line in with SAS and use the perl regular expression functions to parse the line. The regular expression syntax is similar to that of sed.

Another possibility would be awk (or perl) to process the file into something SAS can read easily. I find these tools much easier to use on delimited text files than SAS.

Respected Advisor
Posts: 4,930

Re: Importing text file with comma delimiters <challenge>

You could also do everything in SAS, as suggested by Peter.L , in a fairly robust fashion :

data test;

length v1-v9 $24;

infile datalines dlm=',';

input @;

_infile_ = prxchange("s/(\$\d+),(\d{3})/$1$2/o", -1, _infile_);

input v1-v9;

datalines;

11,1374,C5,37,24 to,286330.962000,10. >=$23,000 <$51,000  ,6553.3000,11

;

PG

PG
PROC Star
Posts: 7,487

Re: Importing text file with comma delimiters <challenge>

Posted in reply to nicholas_leung_gmail_com

I think that your problem isn't just not solved but, even worse, that you may not know what you are trying to import.

What do those values represent?  The 3rd value in your example is a character field, the 5th one looks like an incomplete range (i.e., 24 to .), and the 7th one (which you said you want to be">=$23,000 <$51,000" doesn't explain why it is preceded with 10. and not separated with a comma.

Do you know what variables all of those values represent and which ones might be ranges and how the end user wants each dealt with?

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 492 views
  • 1 like
  • 5 in conversation