I am trying to load in a fairly large dataset in which there are line breaks in some rows. This data is fine to view in the csv and can be loaded into R very easily.
How can I get the data loaded in properly?
Code:
proc import datafile="C:/Users/bbgeoa/Documents/My SAS Files/ALLjune20-may21.csv" out= tm.procimport dbms = csv replace; getnames=yes; run;
The input statement does not match the data, try adding statement
guessingrows=max;
to help SAS better parse the lengths.
Doesn't fix the problem, not sure why SAS struggles so hard in reading in data when compared to R/Python
This looks like real unmasked data. I STRONGLY recommend that you remove anything PII from this post asap.
It was sample data
@sasprogramming That's very comforting to know given that there was a real company name and account numbers in it.
Suggest you follow the advice already given:
- guessingrows=max;
This so SAS first scans the whole source file and then creates variable types and lengths suitable for all of the data.
- TERMSTR=CRLF
I normally use Notepad++ to investigate a text file and select view/show symbols/show all characters - this will tell you what you're dealing with.
- options validvarname=v7;
This so the SAS variable names created comply with SAS naming standards. Any non-compliant character like a blank will get replaced with an underscore.
PS I was talking about this Ballot entry:
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.