I am working in SAS Enterprise Guide 7.1 with csv files exported from a higher education learning management system (LMS). The files are encoded as UTF-8. When I import them using a data step, they appear to import fine, but only the first 350,000 lines of data are imported. I read in another thread that this was likely an encoding issue. I took the suggestion there to open the files in Notepad, then in the Save As dialog, change the encoding from UTF-8 to ANSII. That works. Once the files are encoded as ASCII, all the data lines import to SAS just fine. I need to be able to automate this process so I don't have to open each file and save them with the new encoding. Does anyone know of a way either in SAS Enterprise Guide 7.1, Base SAS, or using a bat file to automate this task? I am using a data step to import the data to SAS EG. I have also tried PROC IMPORT with an encoding statement that has not yielded the results I was looking for. Here is my data step code, which only imported the first 350,000 rows of data:
DATA WORK.Sessions;
INFILE "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory_ascii.csv"
DLM=','
MISSOVER
FIRSTOBS=2
DSD;
LENGTH
SessionId $ 7
OrgId $ 4
UserId $ 5
DateStarted 8
DateEnded 8
LastAccessed 8
TimedOut $ 5
HistoryId $ 7 ;
FORMAT
SessionId $CHAR7.
OrgId $CHAR4.
UserId $CHAR5.
DateStarted DATETIME21.2
DateEnded DATETIME21.2
LastAccessed DATETIME21.2
TimedOut $CHAR5.
HistoryId $CHAR7. ;
INPUT
SessionId : $CHAR7.
OrgId : $CHAR4.
UserId : $CHAR5.
DateStarted : ?? ANYDTDTM28.
DateEnded : ?? ANYDTDTM28.
LastAccessed : ?? ANYDTDTM28.
TimedOut : $CHAR5.
HistoryId : $CHAR7. ;
RUN;Below are three attempts I tried using PROC IMPORT. I didn't get any error messages but the also only imported the first 350k lines.
/* ----------------------------------------------- FIRST PROC IMPORT ATTEMPT AT PROC IMPORT THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS ------------------------------------------------ */ proc import datafile = "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv" out = WORK.SESSIONS dbms = csv replace; run; /* ---------------------------------------------- SECOND PROC IMPORT ATTEMPT AT PROC IMPORT THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS ---------------------------------------------- */ filename temp "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv" encoding="utf-8"; proc import datafile = temp out = utf8 dbms = csv replace; run; /* -------------------------------------------------------------------------------- THIRD PROC IMPORT ATTEMPT THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS -------------------------------------------------------------------------------- */ filename temp "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv" encoding="utf-8" lrecl=32767; proc import datafile=temp out=sessions dbms=csv replace; run;
Thanks in advance for your help!
Jeff
Here is a sample of the data:
data WORK.SESSIONS;
infile datalines dsd truncover;
input SessionId:BEST12. OrgId:BEST12. UserId:BEST12. DateStarted:B8601DZ35. DateEnded:B8601DZ35. LastAccessed:B8601DZ35. TimedOut:$5. HistoryId:BEST12.;
format SessionId BEST12. OrgId BEST12. UserId BEST12. DateStarted B8601DZ35. DateEnded B8601DZ35. LastAccessed B8601DZ35. HistoryId BEST12.;
datalines;
7470839 6606 28324 20200803T045055+0000 20200803T045118+0000 20200803T045118+0000 False 7602172
7468717 6606 . 20200803T014715+0000 20200803T044906+0000 20200803T014715+0000 True 7602171
7468085 6606 8586 20200803T005353+0000 20200803T044906+0000 20200803T014445+0000 True 7602170
7468727 6606 . 20200803T014746+0000 20200803T044906+0000 20200803T014746+0000 True 7602169
7468635 6606 27937 20200803T014043+0000 20200803T044906+0000 20200803T014043+0000 True 7602168
7468651 6606 . 20200803T014221+0000 20200803T044906+0000 20200803T014221+0000 True 7602167
7468630 6606 . 20200803T014023+0000 20200803T044906+0000 20200803T014023+0000 True 7602166
7468708 6606 7004 20200803T014653+0000 20200803T044906+0000 20200803T014706+0000 True 7602165
7468733 6606 . 20200803T014752+0000 20200803T044906+0000 20200803T014752+0000 True 7602164
7468673 6606 22851 20200803T014337+0000 20200803T044906+0000 20200803T014337+0000 True 7602163
;;;;Jeff
Check your SAS session ENCODING system option using PROC OPTIONS. If it is not UTF-8 then try starting a SAS session with ENCODING='UTF-8'. Then test reading your CSV file again.
Is it possible to create those files with ut8-bom instead of utf-8? With the bom sas automatically recognizes the encoding and will read the file, unless it contains char not in the ascii-table.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.