BookmarkSubscribeRSS Feed
jmartin-moreno
Fluorite | Level 6

I am working in SAS Enterprise Guide 7.1 with csv files exported from a higher education learning management system (LMS). The files are encoded as UTF-8.  When I import them using a data step, they appear to import fine, but only the first 350,000 lines of data are imported.  I read in another thread that this was likely an encoding issue. I took the suggestion there to open the files in Notepad, then in the Save As dialog, change the encoding from UTF-8 to ANSII. That works. Once the files are encoded as ASCII, all the data lines import to SAS just fine.  I need to be able to automate this process so I don't have to open each file and save them with the new encoding. Does anyone know of a way either in SAS Enterprise Guide 7.1, Base SAS, or using a bat file to automate this task? I am using a data step to import the data to SAS EG. I have also tried PROC IMPORT with an encoding statement that has not yielded the results I was looking for. Here is my data step code, which only imported the first 350,000 rows of data:

DATA WORK.Sessions; 
    INFILE "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory_ascii.csv" 
        DLM=',' 
    	MISSOVER 
		FIRSTOBS=2 
    	DSD; 
    LENGTH
        SessionId        $ 7
        OrgId            $ 4
        UserId           $ 5
        DateStarted        8
        DateEnded          8
        LastAccessed       8
        TimedOut         $ 5
        HistoryId        $ 7 ;
    FORMAT
        SessionId        $CHAR7.
        OrgId            $CHAR4.
        UserId           $CHAR5.
        DateStarted      DATETIME21.2
        DateEnded        DATETIME21.2
        LastAccessed     DATETIME21.2
        TimedOut         $CHAR5.
        HistoryId        $CHAR7. ;
    INPUT
        SessionId        : $CHAR7.
        OrgId            : $CHAR4.
        UserId           : $CHAR5.
        DateStarted      : ?? ANYDTDTM28.
        DateEnded        : ?? ANYDTDTM28.
        LastAccessed     : ?? ANYDTDTM28.
        TimedOut         : $CHAR5.
        HistoryId        : $CHAR7. ;
RUN;

Below are three attempts I tried using PROC IMPORT.  I didn't get any error messages but the also only imported the first 350k lines.

/*	-----------------------------------------------
	FIRST PROC IMPORT ATTEMPT AT PROC IMPORT
	THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS 
	------------------------------------------------ */
proc import datafile = "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv"

 out  = WORK.SESSIONS
 dbms  =  csv
 replace;

run;

/*	----------------------------------------------
	SECOND PROC IMPORT ATTEMPT AT PROC IMPORT
	THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS 
	---------------------------------------------- */
filename temp "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv" encoding="utf-8";

proc import datafile = temp
 out  = utf8
 dbms  =  csv
 replace;
run;

/*	--------------------------------------------------------------------------------
	THIRD PROC IMPORT ATTEMPT
	THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS 
	-------------------------------------------------------------------------------- */
filename temp "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv" encoding="utf-8" lrecl=32767;

   proc import datafile=temp out=sessions dbms=csv replace;
   run;

Thanks in advance for your help!

 

Jeff

3 REPLIES 3
jmartin-moreno
Fluorite | Level 6

Here is a sample of the data:

data WORK.SESSIONS;
  infile datalines dsd truncover;
  input SessionId:BEST12. OrgId:BEST12. UserId:BEST12. DateStarted:B8601DZ35. DateEnded:B8601DZ35. LastAccessed:B8601DZ35. TimedOut:$5. HistoryId:BEST12.;
  format SessionId BEST12. OrgId BEST12. UserId BEST12. DateStarted B8601DZ35. DateEnded B8601DZ35. LastAccessed B8601DZ35. HistoryId BEST12.;
datalines;
7470839 6606 28324 20200803T045055+0000 20200803T045118+0000 20200803T045118+0000 False 7602172
7468717 6606 . 20200803T014715+0000 20200803T044906+0000 20200803T014715+0000 True 7602171
7468085 6606 8586 20200803T005353+0000 20200803T044906+0000 20200803T014445+0000 True 7602170
7468727 6606 . 20200803T014746+0000 20200803T044906+0000 20200803T014746+0000 True 7602169
7468635 6606 27937 20200803T014043+0000 20200803T044906+0000 20200803T014043+0000 True 7602168
7468651 6606 . 20200803T014221+0000 20200803T044906+0000 20200803T014221+0000 True 7602167
7468630 6606 . 20200803T014023+0000 20200803T044906+0000 20200803T014023+0000 True 7602166
7468708 6606 7004 20200803T014653+0000 20200803T044906+0000 20200803T014706+0000 True 7602165
7468733 6606 . 20200803T014752+0000 20200803T044906+0000 20200803T014752+0000 True 7602164
7468673 6606 22851 20200803T014337+0000 20200803T044906+0000 20200803T014337+0000 True 7602163
;;;;

Jeff

SASKiwi
PROC Star

Check your SAS session ENCODING system option using PROC OPTIONS. If it is not UTF-8 then try starting a SAS session with ENCODING='UTF-8'. Then test reading your CSV file again.

andreas_lds
Jade | Level 19

Is it possible to create those files with ut8-bom instead of utf-8? With the bom sas automatically recognizes the encoding and will read the file, unless it contains char not in the ascii-table.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1739 views
  • 2 likes
  • 3 in conversation