BookmarkSubscribeRSS Feed
sasprogramming
Quartz | Level 8

I am trying to load in a fairly large dataset in which there are line breaks in some rows. This data is fine to view in the csv and can be loaded into R very easily.

How can I get the data loaded in properly?

 

Code:

 

proc import
datafile="C:/Users/bbgeoa/Documents/My SAS Files/ALLjune20-may21.csv" 
out= tm.procimport dbms = csv replace;
getnames=yes;
run;

 

 

 

 

9 REPLIES 9
ChrisNZ
Tourmaline | Level 20

The input statement does not match the data, try adding statement

guessingrows=max;

to help SAS better parse the lengths.

sasprogramming
Quartz | Level 8

Doesn't fix the problem, not sure why SAS struggles so hard in reading in data when compared to R/Python

Ksharp
Super User
And make sure "ALLjune20-may21.csv" is created under Windows, if your sas is running under Windows .
Otherwise, you need change end-of-line character like :



filename x 'c:\temp\ALLjune20-may21.csv' termstr=lf;
proc import datafile=x ............



OR


filename x 'c:\temp\ALLjune20-may21.csv' termstr=crlf;
proc import datafile=x ............
Patrick
Opal | Level 21

@sasprogramming 

This looks like real unmasked data. I STRONGLY recommend that you remove anything PII from this post asap.

sasprogramming
Quartz | Level 8

It was sample data

Patrick
Opal | Level 21

@sasprogramming That's very comforting to know given that there was a real company name and account numbers in it.

 

Suggest you follow the advice already given:
- guessingrows=max;

         This so SAS first scans the whole source file and then creates variable types and lengths suitable for all of the data.

- TERMSTR=CRLF 

           I normally use Notepad++ to investigate a text file and select view/show symbols/show all characters - this will tell you what you're dealing with.

- options validvarname=v7;

         This so the SAS variable names created comply with SAS naming standards. Any non-compliant character like a blank will get replaced with an underscore.

Kurt_Bremser
Super User
You may be lucky if the unwanted line breaks in text are just CR or LF, while the "real" line breaks are CRLF (or the other way round); in this case, the TERMSTR= option will bail you out.
It is along on-going issue with SAS that line breaks in variables are not recognized as "invalid".
Even a well-supported Ballot Idea has been rejected by SAS, which is considered a severe insult by many of the senior members here.
Ksharp
Super User
Can you post some data of "ALLjune20-may21.csv" ?
So we can test it and find out where is the problem .

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1090 views
  • 8 likes
  • 5 in conversation