Proc Import text file doesn't import all my data

Blandine · Posted 01-21-2021 04:59 PM

Hello,

I am importing 27 text files into a single files in SAS but some of my data are missing in the final file I created. I am not sure how to solve it.

Here is my code

proc import datafile='C:\Users\bmrg2\Box Sync\Mizzou 2019\Research\SAS Code\Blandine Data\RBI2020\ILLINOIS\IL_Added_inspection_year\*.txt'
dbms=dlm out=IL.IL_NBI_Data replace;
delimiter =",";
getnames=yes;
guessingrows=1750000;
run;

The data have 16 variables (columns) and at least 28000 observations(row). How can I fix that or how do I make sure all the data are all imported?

SASKiwi · Posted 01-21-2021 05:10 PM

You need to investigate each import file to identify which ones are problematic.

Open each file in a text editor (not Excel) and verify the number of records. Read that file into SAS and verify you get the same number of data rows in SAS.

You may need to fine tune the SAS DATA step code that SAS generates with PROC IMPORT. You will see this in your SAS log. You just copy it back to your SAS editor and run it from there after making any required changes.

Reeza · Posted 01-21-2021 05:20 PM

Why do you think records are missing?

Blandine · Posted 01-21-2021 05:46 PM

I did some data processing afterwards and that how I found out something was wrong. After some investigations I found out the issue was with the proc import. Individual files are OK but when I imported them into a single file I have some rows missing

Reeza · Posted 01-21-2021 06:01 PM

What do you mean you imported them into a single file? Did you write a manual data step that then re-processed all the files?
Did you combine the inputs from multiple iterations of PROC IMPORT?

The code shown is not problematic, assuming your log is clean, so it's a logical issue most likely somewhere in your process. If your log isn't clean from the import then post that.

ballardw · Posted 01-21-2021 05:48 PM

A couple of minor comments:

If you attempt to read multiple files and each of the files has a header row, which seems likely from your use of "Getnames=Yes", then I suspect most if not all of your variables will be character due to encountering the header row of the second file.

What does the LOG look like? Does it indicate 27 files were read? Any warnings?

Proc Import will generate data step code that appears in the log to diagnose specific variable issues.

Blandine · Posted 01-21-2021 07:25 PM

The log file is clean, After investigating my each file (prior to merging them into a single file) I realized that in the first 3 files, one column variable (STRUCTURE_NUMBER_008) is num type and the rest (24 remaining files) are Char type.

In the single file previously created I only find the first 3 observation of my variable STRUCTURE_NUMBER_008 which I assumed SAS only considered the num type.

I tried to convert it in Char type but it changed the observation label. I want to keep the original label and format so it will match the remaining 24 files and I can do a quick check but I don't know how to do it.

I must say that in each files the observation are repeated (I am working with bridge data for 27 years) and I have the same variable name.

This is the original file with the variable that has issue (Sorry I can't upload screenshot it won't let me)

STRUCTURE_NUMBER_008

1000200603

1000300604

1000400605

....

after transformation (from num to Char) I get this

STRUCTURE_NUMBER_008

1.0002E9

1.0003E9

1.0004E9

......

the code I used to change from num typ to Char type is:

proc sql noprint;
select name into :varlist separated by ' '
from sashelp.vcolumn
where upcase(libname) eq 'IL' and upcase(memname) eq 'IL1992';
quit;

/* modify variable type and length */
data IL.IL1992 (drop=v1);
retain &varlist; *<-- preserve variable order ;
length STRUCTURE_NUMBER_008 $15 ; *<-- define new types/lengths ;
set IL.IL1992 (rename=(STRUCTURE_NUMBER_008=v1));
STRUCTURE_NUMBER_008 = put(v1,15.); *<-- redefine STRUCTURE_NUMBER_008 variable ;
run;

SASKiwi · Posted 01-21-2021 07:55 PM

You should use the DATA step code generated by PROC IMPORT to read in all your files. That way you can ensure all variables stay the same type when reading all your CSV files and avoid having to do type conversions at all.

Blandine · Posted 01-22-2021 09:11 AM

Can you please help me with the code then?

Tom · Posted 01-21-2021 08:04 PM

This is exactly why you should not use PROC IMPORT to read multiple files that should all have the same structure. PROC IMPORT has to GUESS how to define the variables. By asking it to process each file independently it might make different decisions on different subsets of the data.

Also if those numbers are supposed to be identifiers then leave them as character strings. There is limit to the number of digits you can store in a number.

If all of your CSV filenames follow a simple pattern you can even use a wildcard in the INFILE statement to read them all at once. So if you just want to read the first three variables from the files then code like this will read all of the files.

data want ;
   length filen $256 ;
   infile '/my directory/*.csv' dsd truncover filename=filen;
   input @;
   if filen ne lag(filen) then input ;
   input var1 var2 STRUCTURE_NUMBER_008 :$20. ;
   filename=filen ;
run;

Blandine · Posted 01-22-2021 09:13 AM

Hey Tom I am using Text file, can you please help with a code that will help me import all of them at once without using proc import? I need to read all the files

thanks

Tom · Posted 01-22-2021 09:23 AM

Post a few lines of one of the files you are trying to read. Use the insert code button (looks like < / > ) to get a pop-up to paste the lines. The files in the link I found for NBI data is NOT a csv file, so you cannot use PROC IMPORT. The data is in fixed locations and they have information about each variable and which columns it is stored on the lines in the text file.

Blandine · Posted 01-22-2021 09:31 AM

STATE_CODE_001,STRUCTURE_NUMBER_008,RECORD_TYPE_005A,ROUTE_PREFIX_005B,SERVICE_LEVEL_005C,ROUTE_NUMBER_005D,DIRECTION_005E,HIGHWAY_DISTRICT_002,COUNTY_CODE_003,PLACE_CODE_004,FEATURES_DESC_006A,CRITICAL_FACILITY_006B,FACILITY_CARRIED_007,LOCATION_009,MIN_VERT_CLR_010,KILOPOINT_011,BASE_HWY_NETWORK_012,LRS_INV_ROUTE_013A,SUBROUTE_NO_013B,LAT_016,LONG_017,DETOUR_KILOS_019,TOLL_020,MAINTENANCE_021,OWNER_022,FUNCTIONAL_CLASS_026,YEAR_BUILT_027,TRAFFIC_LANES_ON_028A,TRAFFIC_LANES_UND_028B,ADT_029,YEAR_ADT_030,DESIGN_LOAD_031,APPR_WIDTH_MT_032,MEDIAN_CODE_033,DEGREES_SKEW_034,STRUCTURE_FLARED_035,RAILINGS_036A,TRANSITIONS_036B,APPR_RAIL_036C,APPR_RAIL_END_036D,HISTORY_037,NAVIGATION_038,NAV_VERT_CLR_MT_039,NAV_HORR_CLR_MT_040,OPEN_CLOSED_POSTED_041,SERVICE_ON_042A,SERVICE_UND_042B,STRUCTURE_KIND_043A,STRUCTURE_TYPE_043B,APPR_KIND_044A,APPR_TYPE_044B,MAIN_UNIT_SPANS_045,APPR_SPANS_046,HORR_CLR_MT_047,MAX_SPAN_LEN_MT_048,STRUCTURE_LEN_MT_049,LEFT_CURB_MT_050A,RIGHT_CURB_MT_050B,ROADWAY_WIDTH_MT_051,DECK_WIDTH_MT_052,VERT_CLR_OVER_MT_053,VERT_CLR_UND_REF_054A,VERT_CLR_UND_054B,LAT_UND_REF_055A,LAT_UND_MT_055B,LEFT_LAT_UND_MT_056,DECK_COND_058,SUPERSTRUCTURE_COND_059,SUBSTRUCTURE_COND_060,CHANNEL_COND_061,CULVERT_COND_062,OPR_RATING_METH_063,OPERATING_RATING_064,INV_RATING_METH_065,INVENTORY_RATING_066,STRUCTURAL_EVAL_067,DECK_GEOMETRY_EVAL_068,UNDCLRENCE_EVAL_069,POSTING_EVAL_070,WATERWAY_EVAL_071,APPR_ROAD_EVAL_072,WORK_PROPOSED_075A,WORK_DONE_BY_075B,IMP_LEN_MT_076,DATE_OF_INSPECT_090,INSPECT_FREQ_MONTHS_091,FRACTURE_092A,UNDWATER_LOOK_SEE_092B,SPEC_INSPECT_092C,FRACTURE_LAST_DATE_093A,UNDWATER_LAST_DATE_093B,SPEC_LAST_DATE_093C,BRIDGE_IMP_COST_094,ROADWAY_IMP_COST_095,TOTAL_IMP_COST_096,YEAR_OF_IMP_097,OTHER_STATE_CODE_098A,OTHER_STATE_PCNT_098B,OTHR_STATE_STRUC_NO_099,STRAHNET_HIGHWAY_100,PARALLEL_STRUCTURE_101,TRAFFIC_DIRECTION_102,TEMP_STRUCTURE_103,HIGHWAY_SYSTEM_104,FEDERAL_LANDS_105,YEAR_RECONSTRUCTED_106,DECK_STRUCTURE_TYPE_107,SURFACE_TYPE_108A,MEMBRANE_TYPE_108B,DECK_PROTECTION_108C,PERCENT_ADT_TRUCK_109,NATIONAL_NETWORK_110,PIER_PROTECTION_111,BRIDGE_LEN_IND_112,SCOUR_CRITICAL_113,FUTURE_ADT_114,YEAR_OF_FUTURE_ADT_115,MIN_NAV_CLR_MT_116,FED_AGENCY,DATE_LAST_UPDATE,TYPE_LAST_UPDATE,DEDUCT_CODE,REMARKS,PROGRAM_CODE,PROJ_NO,PROJ_SUFFIX,NBI_TYPE_OF_IMP,DTL_TYPE_OF_IMP,SPECIAL_CODE,STEP_CODE,STATUS_WITH_10YR_RULE,SUFFICIENCY_ASTERC,SUFFICIENCY_RATING,STATUS_NO_10YR_RULE
17,000001000200603,1,3,1,00061,0,06,001,36295,'BIG NECK CREEK          ',,'ILL61&94          ','.5 MI N JCT 61 94        ',99.99,,,,,40084200,091040000,0,3,01,01,06,1924,2,0,1200,1991,0,9.8,0,0,0,1,1,1,1,4,0,0,0,A,1,5,1,01,0,00,1,0,9.2,7.6,7.9,0,0,8.8,9.8,99.99,N,0,N,0,0,6,6,6,5,N,,47.7,,27,6,5,N,5,8,8,,,0,791,24,N  ,Y36,N  ,,0791,,0,0,0,2000,,,,0,N,2,,,,1991,1,1,0,0,20,0,,Y,6,1680,2013,,N,,,Z ,,,,0,,,,,0,,84.3,Y
17,000001000300604,1,3,1,00061,0,06,001,36295,'BIG NECK CREEK          ',,'ILL61             ','.1 MI W JCT 61&94        ',99.99,,,,,40080600,091041800,0,3,01,01,06,1925,2,0,650,1991,0,9.1,0,30,0,0,1,0,1,4,0,0,0,A,1,5,1,01,0,00,1,0,9.2,7.9,8.5,0,0,8.8,9.9,99.99,N,0,N,0,0,6,6,5,6,N,,52.2,,28.8,5,5,N,5,7,6,,,0,691,24,N  ,N  ,N  ,,,,0,0,0,2000,,,,0,N,2,,,,0,1,6,0,0,23,0,,Y,6,1120,2013,,N,,,Z ,,,,0,,,,,0,,84.8,Y
17,000001000400605,1,3,1,00061,0,06,001,36295,'BIG NECK CR

Here are the file , there are at least 100 variables I am just keeping about 15 when I import them.

Tom · Posted 01-22-2021 09:51 AM

Since this is a delimited file you have to read every field (at least up to the last one you actually) want in order to know where to find the values for the variables.

Personally I would create a program that reads all of the variables. You can then use KEEP= or DROP= dataset options or KEEP or DROP statements to select the variables you want to use for your current analysis.

But if you want to do something quick and dirty you could just read the fields you want to skip over into a dummy variable that you then drop.

Example:

data subset ;
  infile 'myfile.txt' dsd truncover firstobs=2;
  length STATE_CODE_001 8  STRUCTURE_NUMBER_008 $20 HIGHWAY_DISTRICT_002 $20 ;
  length dummy $1 ;
  input STATE_CODE_001 STRUCTURE_NUMBER_008 5*dummy HIGHWAY_DISTRICT_002 ;
  drop dummy;
run;

/*
1 STATE_CODE_001
2 STRUCTURE_NUMBER_008
3 RECORD_TYPE_005A
4 ROUTE_PREFIX_005B
5 SERVICE_LEVEL_005C
6 ROUTE_NUMBER_005D
7 DIRECTION_005E
8 HIGHWAY_DISTRICT_002
*/

Blandine · Posted 01-22-2021 10:16 AM

I am not sure how this step will help me since I have already imported individual files.

will this step ensure that all the variable are in the same type or what?

Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Re: Proc Import text file doesn't import all my data

Registration is open