Hello,
I am importing 27 text files into a single files in SAS but some of my data are missing in the final file I created. I am not sure how to solve it.
Here is my code
proc import datafile='C:\Users\bmrg2\Box Sync\Mizzou 2019\Research\SAS Code\Blandine Data\RBI2020\ILLINOIS\IL_Added_inspection_year\*.txt'
dbms=dlm out=IL.IL_NBI_Data replace;
delimiter =",";
getnames=yes;
guessingrows=1750000;
run;
The data have 16 variables (columns) and at least 28000 observations(row). How can I fix that or how do I make sure all the data are all imported?
You need to investigate each import file to identify which ones are problematic.
Open each file in a text editor (not Excel) and verify the number of records. Read that file into SAS and verify you get the same number of data rows in SAS.
You may need to fine tune the SAS DATA step code that SAS generates with PROC IMPORT. You will see this in your SAS log. You just copy it back to your SAS editor and run it from there after making any required changes.
I did some data processing afterwards and that how I found out something was wrong. After some investigations I found out the issue was with the proc import. Individual files are OK but when I imported them into a single file I have some rows missing
A couple of minor comments:
If you attempt to read multiple files and each of the files has a header row, which seems likely from your use of "Getnames=Yes", then I suspect most if not all of your variables will be character due to encountering the header row of the second file.
What does the LOG look like? Does it indicate 27 files were read? Any warnings?
Proc Import will generate data step code that appears in the log to diagnose specific variable issues.
The log file is clean, After investigating my each file (prior to merging them into a single file) I realized that in the first 3 files, one column variable (STRUCTURE_NUMBER_008) is num type and the rest (24 remaining files) are Char type.
In the single file previously created I only find the first 3 observation of my variable STRUCTURE_NUMBER_008 which I assumed SAS only considered the num type.
I tried to convert it in Char type but it changed the observation label. I want to keep the original label and format so it will match the remaining 24 files and I can do a quick check but I don't know how to do it.
I must say that in each files the observation are repeated (I am working with bridge data for 27 years) and I have the same variable name.
This is the original file with the variable that has issue (Sorry I can't upload screenshot it won't let me)
STRUCTURE_NUMBER_008
1000200603
1000300604
1000400605
....
after transformation (from num to Char) I get this
STRUCTURE_NUMBER_008
1.0002E9
1.0003E9
1.0004E9
......
the code I used to change from num typ to Char type is:
proc sql noprint;
select name into :varlist separated by ' '
from sashelp.vcolumn
where upcase(libname) eq 'IL' and upcase(memname) eq 'IL1992';
quit;
/* modify variable type and length */
data IL.IL1992 (drop=v1);
retain &varlist; *<-- preserve variable order ;
length STRUCTURE_NUMBER_008 $15 ; *<-- define new types/lengths ;
set IL.IL1992 (rename=(STRUCTURE_NUMBER_008=v1));
STRUCTURE_NUMBER_008 = put(v1,15.); *<-- redefine STRUCTURE_NUMBER_008 variable ;
run;
You should use the DATA step code generated by PROC IMPORT to read in all your files. That way you can ensure all variables stay the same type when reading all your CSV files and avoid having to do type conversions at all.
Can you please help me with the code then?
This is exactly why you should not use PROC IMPORT to read multiple files that should all have the same structure. PROC IMPORT has to GUESS how to define the variables. By asking it to process each file independently it might make different decisions on different subsets of the data.
Also if those numbers are supposed to be identifiers then leave them as character strings. There is limit to the number of digits you can store in a number.
If all of your CSV filenames follow a simple pattern you can even use a wildcard in the INFILE statement to read them all at once. So if you just want to read the first three variables from the files then code like this will read all of the files.
data want ;
length filen $256 ;
infile '/my directory/*.csv' dsd truncover filename=filen;
input @;
if filen ne lag(filen) then input ;
input var1 var2 STRUCTURE_NUMBER_008 :$20. ;
filename=filen ;
run;
Hey Tom I am using Text file, can you please help with a code that will help me import all of them at once without using proc import? I need to read all the files
thanks
Post a few lines of one of the files you are trying to read. Use the insert code button (looks like < / > ) to get a pop-up to paste the lines. The files in the link I found for NBI data is NOT a csv file, so you cannot use PROC IMPORT. The data is in fixed locations and they have information about each variable and which columns it is stored on the lines in the text file.
STATE_CODE_001,STRUCTURE_NUMBER_008,RECORD_TYPE_005A,ROUTE_PREFIX_005B,SERVICE_LEVEL_005C,ROUTE_NUMBER_005D,DIRECTION_005E,HIGHWAY_DISTRICT_002,COUNTY_CODE_003,PLACE_CODE_004,FEATURES_DESC_006A,CRITICAL_FACILITY_006B,FACILITY_CARRIED_007,LOCATION_009,MIN_VERT_CLR_010,KILOPOINT_011,BASE_HWY_NETWORK_012,LRS_INV_ROUTE_013A,SUBROUTE_NO_013B,LAT_016,LONG_017,DETOUR_KILOS_019,TOLL_020,MAINTENANCE_021,OWNER_022,FUNCTIONAL_CLASS_026,YEAR_BUILT_027,TRAFFIC_LANES_ON_028A,TRAFFIC_LANES_UND_028B,ADT_029,YEAR_ADT_030,DESIGN_LOAD_031,APPR_WIDTH_MT_032,MEDIAN_CODE_033,DEGREES_SKEW_034,STRUCTURE_FLARED_035,RAILINGS_036A,TRANSITIONS_036B,APPR_RAIL_036C,APPR_RAIL_END_036D,HISTORY_037,NAVIGATION_038,NAV_VERT_CLR_MT_039,NAV_HORR_CLR_MT_040,OPEN_CLOSED_POSTED_041,SERVICE_ON_042A,SERVICE_UND_042B,STRUCTURE_KIND_043A,STRUCTURE_TYPE_043B,APPR_KIND_044A,APPR_TYPE_044B,MAIN_UNIT_SPANS_045,APPR_SPANS_046,HORR_CLR_MT_047,MAX_SPAN_LEN_MT_048,STRUCTURE_LEN_MT_049,LEFT_CURB_MT_050A,RIGHT_CURB_MT_050B,ROADWAY_WIDTH_MT_051,DECK_WIDTH_MT_052,VERT_CLR_OVER_MT_053,VERT_CLR_UND_REF_054A,VERT_CLR_UND_054B,LAT_UND_REF_055A,LAT_UND_MT_055B,LEFT_LAT_UND_MT_056,DECK_COND_058,SUPERSTRUCTURE_COND_059,SUBSTRUCTURE_COND_060,CHANNEL_COND_061,CULVERT_COND_062,OPR_RATING_METH_063,OPERATING_RATING_064,INV_RATING_METH_065,INVENTORY_RATING_066,STRUCTURAL_EVAL_067,DECK_GEOMETRY_EVAL_068,UNDCLRENCE_EVAL_069,POSTING_EVAL_070,WATERWAY_EVAL_071,APPR_ROAD_EVAL_072,WORK_PROPOSED_075A,WORK_DONE_BY_075B,IMP_LEN_MT_076,DATE_OF_INSPECT_090,INSPECT_FREQ_MONTHS_091,FRACTURE_092A,UNDWATER_LOOK_SEE_092B,SPEC_INSPECT_092C,FRACTURE_LAST_DATE_093A,UNDWATER_LAST_DATE_093B,SPEC_LAST_DATE_093C,BRIDGE_IMP_COST_094,ROADWAY_IMP_COST_095,TOTAL_IMP_COST_096,YEAR_OF_IMP_097,OTHER_STATE_CODE_098A,OTHER_STATE_PCNT_098B,OTHR_STATE_STRUC_NO_099,STRAHNET_HIGHWAY_100,PARALLEL_STRUCTURE_101,TRAFFIC_DIRECTION_102,TEMP_STRUCTURE_103,HIGHWAY_SYSTEM_104,FEDERAL_LANDS_105,YEAR_RECONSTRUCTED_106,DECK_STRUCTURE_TYPE_107,SURFACE_TYPE_108A,MEMBRANE_TYPE_108B,DECK_PROTECTION_108C,PERCENT_ADT_TRUCK_109,NATIONAL_NETWORK_110,PIER_PROTECTION_111,BRIDGE_LEN_IND_112,SCOUR_CRITICAL_113,FUTURE_ADT_114,YEAR_OF_FUTURE_ADT_115,MIN_NAV_CLR_MT_116,FED_AGENCY,DATE_LAST_UPDATE,TYPE_LAST_UPDATE,DEDUCT_CODE,REMARKS,PROGRAM_CODE,PROJ_NO,PROJ_SUFFIX,NBI_TYPE_OF_IMP,DTL_TYPE_OF_IMP,SPECIAL_CODE,STEP_CODE,STATUS_WITH_10YR_RULE,SUFFICIENCY_ASTERC,SUFFICIENCY_RATING,STATUS_NO_10YR_RULE 17,000001000200603,1,3,1,00061,0,06,001,36295,'BIG NECK CREEK ',,'ILL61&94 ','.5 MI N JCT 61 94 ',99.99,,,,,40084200,091040000,0,3,01,01,06,1924,2,0,1200,1991,0,9.8,0,0,0,1,1,1,1,4,0,0,0,A,1,5,1,01,0,00,1,0,9.2,7.6,7.9,0,0,8.8,9.8,99.99,N,0,N,0,0,6,6,6,5,N,,47.7,,27,6,5,N,5,8,8,,,0,791,24,N ,Y36,N ,,0791,,0,0,0,2000,,,,0,N,2,,,,1991,1,1,0,0,20,0,,Y,6,1680,2013,,N,,,Z ,,,,0,,,,,0,,84.3,Y 17,000001000300604,1,3,1,00061,0,06,001,36295,'BIG NECK CREEK ',,'ILL61 ','.1 MI W JCT 61&94 ',99.99,,,,,40080600,091041800,0,3,01,01,06,1925,2,0,650,1991,0,9.1,0,30,0,0,1,0,1,4,0,0,0,A,1,5,1,01,0,00,1,0,9.2,7.9,8.5,0,0,8.8,9.9,99.99,N,0,N,0,0,6,6,5,6,N,,52.2,,28.8,5,5,N,5,7,6,,,0,691,24,N ,N ,N ,,,,0,0,0,2000,,,,0,N,2,,,,0,1,6,0,0,23,0,,Y,6,1120,2013,,N,,,Z ,,,,0,,,,,0,,84.8,Y 17,000001000400605,1,3,1,00061,0,06,001,36295,'BIG NECK CR
Here are the file , there are at least 100 variables I am just keeping about 15 when I import them.
Since this is a delimited file you have to read every field (at least up to the last one you actually) want in order to know where to find the values for the variables.
Personally I would create a program that reads all of the variables. You can then use KEEP= or DROP= dataset options or KEEP or DROP statements to select the variables you want to use for your current analysis.
But if you want to do something quick and dirty you could just read the fields you want to skip over into a dummy variable that you then drop.
Example:
data subset ;
infile 'myfile.txt' dsd truncover firstobs=2;
length STATE_CODE_001 8 STRUCTURE_NUMBER_008 $20 HIGHWAY_DISTRICT_002 $20 ;
length dummy $1 ;
input STATE_CODE_001 STRUCTURE_NUMBER_008 5*dummy HIGHWAY_DISTRICT_002 ;
drop dummy;
run;
/*
1 STATE_CODE_001
2 STRUCTURE_NUMBER_008
3 RECORD_TYPE_005A
4 ROUTE_PREFIX_005B
5 SERVICE_LEVEL_005C
6 ROUTE_NUMBER_005D
7 DIRECTION_005E
8 HIGHWAY_DISTRICT_002
*/
I am not sure how this step will help me since I have already imported individual files.
will this step ensure that all the variable are in the same type or what?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.