BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
coladuck
Fluorite | Level 6

Hi, I try to import data from open data source using the following code: 

filename csvFile url "https://data.medicare.gov/resource/4pq5-n9py.csv";

proc import datafile=csvFile out=test2 replace dbms=csv; run;

The program runs but data are not imported correctly. Strange rows are showing up, messing up everything. 

Capture.PNG

Anything I missed? Thank you! 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@coladuck 

Here some code which removes the LF in the data before using Proc Import.

filename incsv url "https://data.medicare.gov/resource/4pq5-n9py.csv" recfm=n lrecl=1;
filename tmpcsv temp recfm=n;

data _null_;
  length _isString 3;
  retain _isString 0;
  file tmpcsv;
  infile incsv;
  input ;
  if _infile_='"' then _isString= (_isString=0);
  if _isString=1 and _infile_='0A'x then _infile_=' ';
  put _infile_ @@;
run;

filename outcsv "%sysfunc(pathname(tmpcsv))" lrecl=1000;
proc import datafile=outcsv out=test2 replace dbms=csv; 
run;

filename tmpcsv clear;

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

This is probably an excel export. The lines include an LF character.

 

You can add the IGNOREDOSEOF option to the INFILE statement.

ChrisNZ
Tourmaline | Level 20

Sadly SAS will not get smarter about quoted LF characters in CSV files.

Patrick
Opal | Level 21

The challenge is that LF is both the end-of-line indicator and part of data. The only thing which keeps things apart is that the LF characters which are part of data are within double quotes. 

The SAS Import Procedure (and data step) are both not great at dealing with this.

 

If this is a one-off task then first download the file vie browser to your local machine and then use the EG Import Wizard. 

 

If it's a repeated task then you need to pre-process the file and remove the LF within quotes. If you search a bit the forum here then you'll find several existing discussions/solutions for this.

Reeza
Super User
Agreed that this sucks! This seems consistent with how scripting languages still work, but R/Python both have this figured out. I spent way too much time recently debugging this issue because the command line count didn't match the imported count - because there was an extra LF in some columns. Ironically, it didn't matter because R had read it correctly so it wasn't the source of the issue but it still was a pain in the a**.
Patrick
Opal | Level 21

@coladuck 

Here some code which removes the LF in the data before using Proc Import.

filename incsv url "https://data.medicare.gov/resource/4pq5-n9py.csv" recfm=n lrecl=1;
filename tmpcsv temp recfm=n;

data _null_;
  length _isString 3;
  retain _isString 0;
  file tmpcsv;
  infile incsv;
  input ;
  if _infile_='"' then _isString= (_isString=0);
  if _isString=1 and _infile_='0A'x then _infile_=' ';
  put _infile_ @@;
run;

filename outcsv "%sysfunc(pathname(tmpcsv))" lrecl=1000;
proc import datafile=outcsv out=test2 replace dbms=csv; 
run;

filename tmpcsv clear;
coladuck
Fluorite | Level 6

Thank you everyone for the great inputs!! Patrick's code works beautifully so I accepted it as the solution. Now need to spend some time understanding the code! Smiley LOL

Patrick
Opal | Level 21

@coladuck wrote:

Thank you everyone for the great inputs!! Patrick's code works beautifully so I accepted it as the solution. Now need to spend some time understanding the code! Smiley LOL


Even after removing the LF between the quotes it looks to me Proc Import still doesn't read the data as one would wish for. One reason is that all strings are within quotes so Proc Import treats them all as character.

 

You're likely better off if you write a SAS data step. Take EG import wizard generated code (or use Proc Import generated code from the SAS log) and then amend the code so you really get what you need.

 

I've attached the source data with the LF removed. This is what you get under fileref outcsv from the code you've marked as solution.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 820 views
  • 2 likes
  • 4 in conversation