- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am working in SAS Enterprise Guide 7.1 with csv files exported from a higher education learning management system (LMS). The files are encoded as UTF-8. When I import them using a data step, they appear to import fine, but only the first 350,000 lines of data are imported. I read in another thread that this was likely an encoding issue. I took the suggestion there to open the files in Notepad, then in the Save As dialog, change the encoding from UTF-8 to ANSII. That works. Once the files are encoded as ASCII, all the data lines import to SAS just fine. I need to be able to automate this process so I don't have to open each file and save them with the new encoding. Does anyone know of a way either in SAS Enterprise Guide 7.1, Base SAS, or using a bat file to automate this task? I am using a data step to import the data to SAS EG. I have also tried PROC IMPORT with an encoding statement that has not yielded the results I was looking for. Here is my data step code, which only imported the first 350,000 rows of data:
DATA WORK.Sessions; INFILE "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory_ascii.csv" DLM=',' MISSOVER FIRSTOBS=2 DSD; LENGTH SessionId $ 7 OrgId $ 4 UserId $ 5 DateStarted 8 DateEnded 8 LastAccessed 8 TimedOut $ 5 HistoryId $ 7 ; FORMAT SessionId $CHAR7. OrgId $CHAR4. UserId $CHAR5. DateStarted DATETIME21.2 DateEnded DATETIME21.2 LastAccessed DATETIME21.2 TimedOut $CHAR5. HistoryId $CHAR7. ; INPUT SessionId : $CHAR7. OrgId : $CHAR4. UserId : $CHAR5. DateStarted : ?? ANYDTDTM28. DateEnded : ?? ANYDTDTM28. LastAccessed : ?? ANYDTDTM28. TimedOut : $CHAR5. HistoryId : $CHAR7. ; RUN;
Below are three attempts I tried using PROC IMPORT. I didn't get any error messages but the also only imported the first 350k lines.
/* ----------------------------------------------- FIRST PROC IMPORT ATTEMPT AT PROC IMPORT THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS ------------------------------------------------ */ proc import datafile = "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv" out = WORK.SESSIONS dbms = csv replace; run; /* ---------------------------------------------- SECOND PROC IMPORT ATTEMPT AT PROC IMPORT THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS ---------------------------------------------- */ filename temp "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv" encoding="utf-8"; proc import datafile = temp out = utf8 dbms = csv replace; run; /* -------------------------------------------------------------------------------- THIRD PROC IMPORT ATTEMPT THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS -------------------------------------------------------------------------------- */ filename temp "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv" encoding="utf-8" lrecl=32767; proc import datafile=temp out=sessions dbms=csv replace; run;
Thanks in advance for your help!
Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a sample of the data:
data WORK.SESSIONS;
infile datalines dsd truncover;
input SessionId:BEST12. OrgId:BEST12. UserId:BEST12. DateStarted:B8601DZ35. DateEnded:B8601DZ35. LastAccessed:B8601DZ35. TimedOut:$5. HistoryId:BEST12.;
format SessionId BEST12. OrgId BEST12. UserId BEST12. DateStarted B8601DZ35. DateEnded B8601DZ35. LastAccessed B8601DZ35. HistoryId BEST12.;
datalines;
7470839 6606 28324 20200803T045055+0000 20200803T045118+0000 20200803T045118+0000 False 7602172
7468717 6606 . 20200803T014715+0000 20200803T044906+0000 20200803T014715+0000 True 7602171
7468085 6606 8586 20200803T005353+0000 20200803T044906+0000 20200803T014445+0000 True 7602170
7468727 6606 . 20200803T014746+0000 20200803T044906+0000 20200803T014746+0000 True 7602169
7468635 6606 27937 20200803T014043+0000 20200803T044906+0000 20200803T014043+0000 True 7602168
7468651 6606 . 20200803T014221+0000 20200803T044906+0000 20200803T014221+0000 True 7602167
7468630 6606 . 20200803T014023+0000 20200803T044906+0000 20200803T014023+0000 True 7602166
7468708 6606 7004 20200803T014653+0000 20200803T044906+0000 20200803T014706+0000 True 7602165
7468733 6606 . 20200803T014752+0000 20200803T044906+0000 20200803T014752+0000 True 7602164
7468673 6606 22851 20200803T014337+0000 20200803T044906+0000 20200803T014337+0000 True 7602163
;;;;
Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Check your SAS session ENCODING system option using PROC OPTIONS. If it is not UTF-8 then try starting a SAS session with ENCODING='UTF-8'. Then test reading your CSV file again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is it possible to create those files with ut8-bom instead of utf-8? With the bom sas automatically recognizes the encoding and will read the file, unless it contains char not in the ascii-table.