SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
jmartin-moreno
Fluorite | Level 6

I am working in SAS Enterprise Guide 7.1 with csv files exported from a higher education learning management system (LMS). The files are encoded as UTF-8.  When I import them using a data step, they appear to import fine, but only the first 350,000 lines of data are imported.  I read in another thread that this was likely an encoding issue. I took the suggestion there to open the files in Notepad, then in the Save As dialog, change the encoding from UTF-8 to ANSII. That works. Once the files are encoded as ASCII, all the data lines import to SAS just fine.  I need to be able to automate this process so I don't have to open each file and save them with the new encoding. Does anyone know of a way either in SAS Enterprise Guide 7.1, Base SAS, or using a bat file to automate this task? I am using a data step to import the data to SAS EG. I have also tried PROC IMPORT with an encoding statement that has not yielded the results I was looking for. Here is my data step code, which only imported the first 350,000 rows of data:

DATA WORK.Sessions; 
    INFILE "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory_ascii.csv" 
        DLM=',' 
    	MISSOVER 
		FIRSTOBS=2 
    	DSD; 
    LENGTH
        SessionId        $ 7
        OrgId            $ 4
        UserId           $ 5
        DateStarted        8
        DateEnded          8
        LastAccessed       8
        TimedOut         $ 5
        HistoryId        $ 7 ;
    FORMAT
        SessionId        $CHAR7.
        OrgId            $CHAR4.
        UserId           $CHAR5.
        DateStarted      DATETIME21.2
        DateEnded        DATETIME21.2
        LastAccessed     DATETIME21.2
        TimedOut         $CHAR5.
        HistoryId        $CHAR7. ;
    INPUT
        SessionId        : $CHAR7.
        OrgId            : $CHAR4.
        UserId           : $CHAR5.
        DateStarted      : ?? ANYDTDTM28.
        DateEnded        : ?? ANYDTDTM28.
        LastAccessed     : ?? ANYDTDTM28.
        TimedOut         : $CHAR5.
        HistoryId        : $CHAR7. ;
RUN;

Below are three attempts I tried using PROC IMPORT.  I didn't get any error messages but the also only imported the first 350k lines.

/*	-----------------------------------------------
	FIRST PROC IMPORT ATTEMPT AT PROC IMPORT
	THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS 
	------------------------------------------------ */
proc import datafile = "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv"

 out  = WORK.SESSIONS
 dbms  =  csv
 replace;

run;

/*	----------------------------------------------
	SECOND PROC IMPORT ATTEMPT AT PROC IMPORT
	THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS 
	---------------------------------------------- */
filename temp "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv" encoding="utf-8";

proc import datafile = temp
 out  = utf8
 dbms  =  csv
 replace;
run;

/*	--------------------------------------------------------------------------------
	THIRD PROC IMPORT ATTEMPT
	THIS INPORTED ONLY THE FIRST 350,000 DATA ROWS 
	-------------------------------------------------------------------------------- */
filename temp "\\MCC-HOMES\IEDATA\IR_DATA\Brightspace\DATA\SessionHistory.csv" encoding="utf-8" lrecl=32767;

   proc import datafile=temp out=sessions dbms=csv replace;
   run;

Thanks in advance for your help!

 

Jeff

3 REPLIES 3
jmartin-moreno
Fluorite | Level 6

Here is a sample of the data:

data WORK.SESSIONS;
  infile datalines dsd truncover;
  input SessionId:BEST12. OrgId:BEST12. UserId:BEST12. DateStarted:B8601DZ35. DateEnded:B8601DZ35. LastAccessed:B8601DZ35. TimedOut:$5. HistoryId:BEST12.;
  format SessionId BEST12. OrgId BEST12. UserId BEST12. DateStarted B8601DZ35. DateEnded B8601DZ35. LastAccessed B8601DZ35. HistoryId BEST12.;
datalines;
7470839 6606 28324 20200803T045055+0000 20200803T045118+0000 20200803T045118+0000 False 7602172
7468717 6606 . 20200803T014715+0000 20200803T044906+0000 20200803T014715+0000 True 7602171
7468085 6606 8586 20200803T005353+0000 20200803T044906+0000 20200803T014445+0000 True 7602170
7468727 6606 . 20200803T014746+0000 20200803T044906+0000 20200803T014746+0000 True 7602169
7468635 6606 27937 20200803T014043+0000 20200803T044906+0000 20200803T014043+0000 True 7602168
7468651 6606 . 20200803T014221+0000 20200803T044906+0000 20200803T014221+0000 True 7602167
7468630 6606 . 20200803T014023+0000 20200803T044906+0000 20200803T014023+0000 True 7602166
7468708 6606 7004 20200803T014653+0000 20200803T044906+0000 20200803T014706+0000 True 7602165
7468733 6606 . 20200803T014752+0000 20200803T044906+0000 20200803T014752+0000 True 7602164
7468673 6606 22851 20200803T014337+0000 20200803T044906+0000 20200803T014337+0000 True 7602163
;;;;

Jeff

SASKiwi
PROC Star

Check your SAS session ENCODING system option using PROC OPTIONS. If it is not UTF-8 then try starting a SAS session with ENCODING='UTF-8'. Then test reading your CSV file again.

andreas_lds
Jade | Level 19

Is it possible to create those files with ut8-bom instead of utf-8? With the bom sas automatically recognizes the encoding and will read the file, unless it contains char not in the ascii-table.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2511 views
  • 2 likes
  • 3 in conversation