I am trying to load in a fairly large dataset in which there are line breaks in some rows. This data is fine to view in the csv and can be loaded into R very easily.
How can I get the data loaded in properly?
Code:
proc import datafile="C:/Users/bbgeoa/Documents/My SAS Files/ALLjune20-may21.csv" out= tm.procimport dbms = csv replace; getnames=yes; run;
The input statement does not match the data, try adding statement
guessingrows=max;
to help SAS better parse the lengths.
Doesn't fix the problem, not sure why SAS struggles so hard in reading in data when compared to R/Python
This looks like real unmasked data. I STRONGLY recommend that you remove anything PII from this post asap.
It was sample data
@sasprogramming That's very comforting to know given that there was a real company name and account numbers in it.
Suggest you follow the advice already given:
- guessingrows=max;
This so SAS first scans the whole source file and then creates variable types and lengths suitable for all of the data.
- TERMSTR=CRLF
I normally use Notepad++ to investigate a text file and select view/show symbols/show all characters - this will tell you what you're dealing with.
- options validvarname=v7;
This so the SAS variable names created comply with SAS naming standards. Any non-compliant character like a blank will get replaced with an underscore.
PS I was talking about this Ballot entry:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.