01-30-2018 01:44 PM - edited 01-30-2018 01:48 PM
I have a simple proc import for CSV:
proc import OUT = &output_table. datafile = "&inputdir./&input_file." DBMS = csv REPLACE; getnames=yes; RUN;
My files delimiter are commas, but I also have commas for amounts inside the file. When it is an amount I have something like:
Now I receive a file like this from a provider and run my import, it's working well.
I open the file and change the value of an amount manually, without removing the amount commas or the double quotes, and it's no longer working.
I open the file and and delete some rows which are out of scope (by selecting Excel rows and just clicking "Delete") and it's no longer working.
In both cases, it imports the file on only one column. the column name is the first column name of the CSV file, and all rows are written exactly like in the CSV file with all commas, on one column only.
Now do you know what could explain that?
How can I modify manually some values without modify the file characteristics (because it seems to be that)?
01-30-2018 01:53 PM
You mentioned "Excel rows." If you're opening the file in Excel and then re-saving it, that could be your problem.
Try making the changes .. to the original file .. using a text editor like notepad or wordpad.
Art, CEO, AnalystFinder.com
01-30-2018 04:09 PM
The symptom you describe is what happens when your file is created using a different delimiter than you told PROC IMPORT to use. Since it does not see any delimiters it treats the whole line as one column.
Why are you use PROC IMPORT to read a delimited file?
You can just write a DATA step to read it yourself. Frequently with less typing than writing a PROC IMPORT code.
This goes even more so if you are getting many similar files.
You can also try running PROC IMPORT and recall the code that it generates and modify that code, although PROC IMPORT writes ugly code, at least it will take guess at what variable names to use and what type of data is in each column.
01-30-2018 04:11 PM
You would have to tell use explicitly what values you change from and too. But unless you take extra steps that almost no one does if your original value was "2208,03" and you change the value to anything that might be interpreted as a number such as 220,803 then Excel changed the CELL to numeric. And very likely on file>save as>CSV (you did do this correct?) the value came out as ,220803, no quotes or even worse ,220,803, no quotes and now a different field.
Either edit with text editor, OR (sometimes this works, sometimes not) set the entire column in Excel as text before making any edit.
OR use program code in a data step:
If value='2208,03' <possibly with other qualifier such as: and ID='abc'> then value='220,803';
Also if you have multiple files with the same structure you would be much better off writing a data step because Proc Import guesses variable length and type every single time and your data sets may not be compatible.
Note that Proc Import with CSV will write a data step to the log that you can copy, paste into the editor, modify and save. Then you change the infile and output data set to get the desired result.
You also should use guessingrows= with a moderately large value or the key word MAX to have better compatibility on variable length when importing CSV or other delimited files.