Hi,
We have a large, 6,884,020 dbf file containing mapping data that we want to read into a SAS. When we run the following code in Enterprise Guide 8.3, we only get a little over 4,000,000 rows read in.
proc import
datafile = '\\server\mapping\Export_OutputWo3setA.dbf'
out= WORK.MapSetA
REPLACE
dbms = dbf;
run;
Thinking there may have been some limitation we hit, we divided the one file into two smaller dbf files, one with 3,610,374 and the other with 3,273,646. After running the same code, we recieved 3,017085 and 2,848721 rows respectively. We are not getting any errors when reading in the data.
We pulled out a very small subset of 157,804 and all records were read into SAS. Our next step is to subset the data into these smaller dbf files and read them that way but thought we should check with the experts before going through the effort. There seems to be some data issues but we don't know what or how to check.
Has anyone seen this behavior with dbf files?
Just an update. We were able to download the file in .txt format, comma delimited from the old mapping software and successfully import all 6.8 million rows from one file into SAS. It must have been the outdated .dbf file which was causing the issue.
Thanks for all the help!
The origin of the dbf files is our ArcMap Mapping software. It is an older software, and the company no longer sells it since they have moved to Arc GIS Pro Mapping software, but they still support our software agreement.
The dbf does not hold memo fields.
I would suggest raising a ticket with SAS Support. That should allow you to send them a copy of the file to examine and see why the file appears truncated.
The only option I see in the PROC IMPORT that would change the number of observations read is the GETDEL= statement.
Do you get a different number of observations if you use GETDEL=YES?
DBF file format is pretty simple. You could look at the file yourself by reading it using RECFM=F and see what it has in it.
https://www.dbf2002.com/dbf-file-format.html
Watch out as that page did not learn to count correctly in kindergarden. So all of the position numbers are off by one since it started counting from zero instead of 1.
@dtamburri wrote:
Thinking there may have been some limitation we hit, we divided the one file into two smaller dbf files, one with 3,610,374 and the other with 3,273,646. After running the same code, we recieved 3,017085 and 2,848721 rows respectively. We are not getting any errors when reading in the data.
I am more likely to suspect the method used to subset the files. Why did you do that in the first place?
I would suggest trying to read your original complete DBF file with SAS. If there is truly a reason to subset the data then do it after reading.
It also would be a good idea to show the entire LOG from reading the file just in case you didn't understand something that might indicate what is going on.
@ballardw We subset the files in troubleshooting. Ideally we would like to read in the full file and subset it in SAS. We did open a track with SAS and they are requesting the file.
There are no indications of any issues in the log. Here is the log text:
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Program';
4 %LET _CLIENTPROCESSFLOWNAME='Standalone Not In Project';
5 %LET _CLIENTPROJECTPATH='';
6 %LET _CLIENTPROJECTPATHHOST='';
7 %LET _CLIENTPROJECTNAME='';
8 %LET _SASPROGRAMFILE='';
9 %LET _SASPROGRAMFILEHOST='';
10
11 ODS _ALL_ CLOSE;
12 OPTIONS DEV=SVG;
13 GOPTIONS XPIXELS=0 YPIXELS=0;
14 %macro HTML5AccessibleGraphSupported;
15 %if %_SAS_VERCOMP_FV(9,4,4, 0,0,0) >= 0 %then ACCESSIBLE_GRAPH;
16 %mend;
17 FILENAME EGHTML TEMP;
18 ODS HTML5(ID=EGHTML) FILE=EGHTML
19 OPTIONS(BITMAP_MODE='INLINE')
20 %HTML5AccessibleGraphSupported
21 ENCODING='utf-8'
22 STYLE=HtmlBlue
23 NOGTITLE
24 NOGFOOTNOTE
25 GPATH=&sasworklocation
26 ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27
28 /*All states latitude <= 38 */
29 proc import
30 datafile = '\\XXXXXXXXXX\mapping\Export_OutputWo3setA.dbf'
31 out= WORK.MapWO3SetA
32 REPLACE
33 dbms = dbf;
34 run;
NOTE: Imported 13 columns and 3017085 rows of data from the dBase table file.
NOTE: WORK.MAPWO3SETA data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 8:02.99
cpu time 1:06.31
35
36 /*All states latitude > 38 */
37 proc import
38 datafile = '\\XXXXXXXXXX\mapping\Export_OutputWo3setB.dbf'
39 out= WORK.MapWO3SetB
40 REPLACE
41 dbms = dbf;
42 run;
NOTE: Imported 13 columns and 2848721 rows of data from the dBase table file.
NOTE: WORK.MAPWO3SETB data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 7:52.14
cpu time 1:05.43
2 The SAS System 07:50 Monday, August 7, 2023
43
44
45
46 %LET _CLIENTTASKLABEL=;
47 %LET _CLIENTPROCESSFLOWNAME=;
48 %LET _CLIENTPROJECTPATH=;
49 %LET _CLIENTPROJECTPATHHOST=;
50 %LET _CLIENTPROJECTNAME=;
51 %LET _SASPROGRAMFILE=;
52 %LET _SASPROGRAMFILEHOST=;
53
54 ;*';*";*/;quit;run;
55 ODS _ALL_ CLOSE;
56
57
58 QUIT; RUN;
59
Thanks!
Possibly a long shot. DBF files can contain records that have been marked for deletion (in the data base) but have not actually been removed from the file.
If you use the GETDELETED = yes option then those marked records would not be removed by SAS. This is supposed to be the default behavior but...
That would look like:
proc import datafile = '\\XXXXXXXXXX\mapping\Export_OutputWo3setA.dbf' out= WORK.MapWO3SetA REPLACE dbms = dbf; getdeleted=yes; run;
If you run the code with Getdeleted=NO; and get the same result that should indicate there are none of these marked for deletion records.
Still question whether the approach to split the file was correct though.
Just an update. We were able to download the file in .txt format, comma delimited from the old mapping software and successfully import all 6.8 million rows from one file into SAS. It must have been the outdated .dbf file which was causing the issue.
Thanks for all the help!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.