Hi, I try to import data from open data source using the following code:
filename csvFile url "https://data.medicare.gov/resource/4pq5-n9py.csv"; proc import datafile=csvFile out=test2 replace dbms=csv; run;
The program runs but data are not imported correctly. Strange rows are showing up, messing up everything.
Anything I missed? Thank you!
Here some code which removes the LF in the data before using Proc Import.
filename incsv url "https://data.medicare.gov/resource/4pq5-n9py.csv" recfm=n lrecl=1;
filename tmpcsv temp recfm=n;
data _null_;
length _isString 3;
retain _isString 0;
file tmpcsv;
infile incsv;
input ;
if _infile_='"' then _isString= (_isString=0);
if _isString=1 and _infile_='0A'x then _infile_=' ';
put _infile_ @@;
run;
filename outcsv "%sysfunc(pathname(tmpcsv))" lrecl=1000;
proc import datafile=outcsv out=test2 replace dbms=csv;
run;
filename tmpcsv clear;
This is probably an excel export. The lines include an LF character.
You can add the IGNOREDOSEOF option to the INFILE statement.
Sadly SAS will not get smarter about quoted LF characters in CSV files.
The challenge is that LF is both the end-of-line indicator and part of data. The only thing which keeps things apart is that the LF characters which are part of data are within double quotes.
The SAS Import Procedure (and data step) are both not great at dealing with this.
If this is a one-off task then first download the file vie browser to your local machine and then use the EG Import Wizard.
If it's a repeated task then you need to pre-process the file and remove the LF within quotes. If you search a bit the forum here then you'll find several existing discussions/solutions for this.
Here some code which removes the LF in the data before using Proc Import.
filename incsv url "https://data.medicare.gov/resource/4pq5-n9py.csv" recfm=n lrecl=1;
filename tmpcsv temp recfm=n;
data _null_;
length _isString 3;
retain _isString 0;
file tmpcsv;
infile incsv;
input ;
if _infile_='"' then _isString= (_isString=0);
if _isString=1 and _infile_='0A'x then _infile_=' ';
put _infile_ @@;
run;
filename outcsv "%sysfunc(pathname(tmpcsv))" lrecl=1000;
proc import datafile=outcsv out=test2 replace dbms=csv;
run;
filename tmpcsv clear;
Thank you everyone for the great inputs!! Patrick's code works beautifully so I accepted it as the solution. Now need to spend some time understanding the code!
@coladuck wrote:
Thank you everyone for the great inputs!! Patrick's code works beautifully so I accepted it as the solution. Now need to spend some time understanding the code!
Even after removing the LF between the quotes it looks to me Proc Import still doesn't read the data as one would wish for. One reason is that all strings are within quotes so Proc Import treats them all as character.
You're likely better off if you write a SAS data step. Take EG import wizard generated code (or use Proc Import generated code from the SAS log) and then amend the code so you really get what you need.
I've attached the source data with the LF removed. This is what you get under fileref outcsv from the code you've marked as solution.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.