Hi guys,
1) I am using SAS EG5.1
2) My data looks something like this:
11,1374,C5,37,24 to | ,286330.962000,10. >=$23,000 <$51,000 | ,6553.3000,11. |
The problem is that I want to capture the ">=$23,000 <$51,000" as one variable, rather than giving me ">=$23" "000 <$51" "000" in three different columns.
Further compounded when some of the sections are like >=$5<$15 which is the problem.
I thought of concatenating them, but it doesnt fix the problem of data being out of alignment.
Any thoughts guys?
Thanks!
Maybe you could simply replace all occurrences of ",000" with "000" with a text editor before reading the file. Would that do the trick? - PG
A CSV file which has commas embedded in values will typically have those values within double quotes.
Can the CSV file be created so that it includes the double quotes?
Thanks Jerry, I don't have access to the raw data unfortunately - I'm working with extracts and they're extracting in text files than CSV.
Not sure if they can extract the data as CSV -> the entire dataset is allgedly 50GB.
Maybe you could simply replace all occurrences of ",000" with "000" with a text editor before reading the file. Would that do the trick? - PG
Good idea - do you have any suggestions for what to use?
Point and click interfaces are helpful
I tried using LTF to no success (couldn't replace text).
No clue how to work VIM out.
Downloading SlickEdit as I type.
I would use sed for this. It is the unix stream editor, also available for Windows. A command line like this will do what you want:
sed -e 's/,000/000/g' infile.csv >outfile.csv
My version is 37 kB. It will work efficiently with very large files.
An alternative would be to read the whole line in with SAS and use the perl regular expression functions to parse the line. The regular expression syntax is similar to that of sed.
Another possibility would be awk (or perl) to process the file into something SAS can read easily. I find these tools much easier to use on delimited text files than SAS.
You could also do everything in SAS, as suggested by Peter.L , in a fairly robust fashion :
data test;
length v1-v9 $24;
infile datalines dlm=',';
input @;
_infile_ = prxchange("s/(\$\d+),(\d{3})/$1$2/o", -1, _infile_);
input v1-v9;
datalines;
11,1374,C5,37,24 to,286330.962000,10. >=$23,000 <$51,000 ,6553.3000,11
;
PG
I think that your problem isn't just not solved but, even worse, that you may not know what you are trying to import.
What do those values represent? The 3rd value in your example is a character field, the 5th one looks like an incomplete range (i.e., 24 to .), and the 7th one (which you said you want to be">=$23,000 <$51,000" doesn't explain why it is preceded with 10. and not separated with a comma.
Do you know what variables all of those values represent and which ones might be ranges and how the end user wants each dealt with?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.