BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jarvin99
Obsidian | Level 7

I have a csv file of 232,190 rows and 9 columns. When I import data into sas, sas only extracts the first 1,011 rows and 9 columns. How can I import all the rows available without losing any data from the csv file?

 

Here is the log result:

 

237 data WORK.announcement ;
238 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
239 infile 'D:\Dropbox\Dataset\9ee779962de5b464.csv'
239! delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
240 informat CompanyName $38. ;
241 informat DirectorName $19. ;
242 informat CommitteeName $12. ;
243 informat JobName $27. ;
244 informat Description $99. ;
245 informat AnnouncementDate best32. ;
246 informat CompanyID best32. ;
247 informat DirectorID best32. ;
248 informat EffectDate best32. ;
249 format CompanyName $38. ;
250 format DirectorName $19. ;
251 format CommitteeName $12. ;
252 format JobName $27. ;
253 format Description $99. ;
254 format AnnouncementDate best12. ;
255 format CompanyID best12. ;
256 format DirectorID best12. ;
257 format EffectDate best12. ;
258 input
259 CompanyName $
260 DirectorName $
261 CommitteeName $
262 JobName $
263 Description $
264 AnnouncementDate
265 CompanyID
266 DirectorID
267 EffectDate
268 ;
269 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
270 run;

NOTE: The infile 'D:\Dropbox\Dataset\9ee779962de5b464.csv' is:
Filename=D:\Dropbox\Dataset\9ee779962de5b464.csv,
RECFM=V,LRECL=32767,File Size (bytes)=39898957,
Last Modified=10,November,2020 13:48:22,
Create Time=08,February,2022 15:21:23

NOTE: 1011 records were read from the infile
'D:\Dropbox\Dataset\9ee779962de5b464.csv'.
The minimum record length was 98.
The maximum record length was 312.
NOTE: The data set WORK.ANNOUNCEMENT has 1011 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.11 seconds
cpu time 0.07 seconds


1011 rows created in WORK.announcement from
D:\Dropbox\Dataset\9ee779962de5b464.csv.

 

NOTE: WORK.ANNOUNCEMENT data set was successfully created.
NOTE: The data set WORK.ANNOUNCEMENT has 1011 observations and 9 variables.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The usual issue that causes that is embedded "DOS" end of file character.  Use the IGNOREDOSEOF option on the INFILE statement.

 

IGNOREDOSEOF

is used in the context of I/O operations on variable record format files. When this option is specified, any occurrence of ^Z is interpreted as character data and not as an end-of-file marker.

 

Why use PROC IMPORT to GUESS how to read a file that only has NINE variables?

Just write your own data step and you will have full control over how the variables are named, defined, labeled and whether or not any formats need to be attached.  Are those last four variables really just plain numbers?  Why aren't the two DATE variables using a date type informat to create actual date values?  Why are the two ID variables being read as numbers instead of character strings?  You do not need to perform arithmetic with ID variables. What do the lines in the file actually have for those fields? 

data announcement ;
  infile 'D:\Dropbox\Dataset\9ee779962de5b464.csv' dsd ignoredoseof truncover firstobs=2;
  length
 CompanyName $50
 DirectorName $30
 CommitteeName $20
 JobName $50
 Description $200
 AnnouncementDate 8
 CompanyID 8
 DirectorID 8
 EffectDate 8
;
  input CompanyName -- EffectDate ;
run;

To see some example values from the file use a simple data step.

data _null_;
  infile 'D:\Dropbox\Dataset\9ee779962de5b464.csv' obs=5 ;
  input;
  list;
run;

View solution in original post

1 REPLY 1
Tom
Super User Tom
Super User

The usual issue that causes that is embedded "DOS" end of file character.  Use the IGNOREDOSEOF option on the INFILE statement.

 

IGNOREDOSEOF

is used in the context of I/O operations on variable record format files. When this option is specified, any occurrence of ^Z is interpreted as character data and not as an end-of-file marker.

 

Why use PROC IMPORT to GUESS how to read a file that only has NINE variables?

Just write your own data step and you will have full control over how the variables are named, defined, labeled and whether or not any formats need to be attached.  Are those last four variables really just plain numbers?  Why aren't the two DATE variables using a date type informat to create actual date values?  Why are the two ID variables being read as numbers instead of character strings?  You do not need to perform arithmetic with ID variables. What do the lines in the file actually have for those fields? 

data announcement ;
  infile 'D:\Dropbox\Dataset\9ee779962de5b464.csv' dsd ignoredoseof truncover firstobs=2;
  length
 CompanyName $50
 DirectorName $30
 CommitteeName $20
 JobName $50
 Description $200
 AnnouncementDate 8
 CompanyID 8
 DirectorID 8
 EffectDate 8
;
  input CompanyName -- EffectDate ;
run;

To see some example values from the file use a simple data step.

data _null_;
  infile 'D:\Dropbox\Dataset\9ee779962de5b464.csv' obs=5 ;
  input;
  list;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 767 views
  • 0 likes
  • 2 in conversation