Hello,
I am trying to import a csv file into SAS. The delimiter is a tap. The log window showed one or more lines were truncated. Then, I have added guessingrows=max in my code. The log window shows the data set was successfully created, but it was not. The data set has 105 variables. The log window shows it has only 13 variables. Any advice and suggestions would be highly appreciated. Here is my code and log window message.
PROC IMPORT OUT=DICAID_20_1 DATAFILE="D:\Mala\E-UL-Data
Request_2000.csv"
DBMS=CSV REPLACE;
GETNAMES=YES;
guessingrows=max;
RUN
328 ;
329 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
330 run;
NOTE: The infile 'D:\Mala\E-UL-Data Request_2013.csv' is:
Filename=D:\Mala\E-UL-Data Request_2013.csv,
RECFM=V,LRECL=32767,
File Size (bytes)=4975718400,
Last Modified=29Apr2021:12:06:18,
Create Time=29Apr2021:12:35:46
NOTE: 5522752 records were read from the infile 'D:\Mala\E-UL-Data Request_2013.csv'.
The minimum record length was 330.
The maximum record length was 1634.
NOTE: The data set WORK.DICAID_20_1 has 5522752 observations and 13 variables.
NOTE: DATA statement used (Total process time):
real time 53:32.57
cpu time 34.87 seconds
output:
CHAR CLAIM_GUID.NUM_DTL.DTE_PTN.DTE_FIRST_SVC.DTE_LAST_SVC.DTE_ADMISSION.DTE_DISCHARGE.BEN_
ZONE 44444545440454545404545554045454455555540454544555554045454444554440454544544454404445
NUMR 3C19DF75949E5DF44C9445F04E9445F69234F3639445FC134F3639445F14D9339FE9445F493381275925EF
As stated by @Tom, you are much better off writing the data step yourself. It will take some time to write it, but it gives you full control over the process, and avoids the analyzation step that IMPORT has to do.
I simply copy/paste the first line of the data file and use it to write the INPUT statement. My knowledge of the column metadata taken from the documentation (you DID receive documentation, didn't you?) helps me define the correct variable attributes.
You told PROC IMPORT to use comma as the delimiter.
But from your example:
CHAR CLAIM_GUID.NUM_DTL.DTE_PTN.DTE_FIRST_SVC.DTE_LAST_SVC.DTE_ADMISSION.DTE_DISCHARGE.BEN_ ZONE 44444545440454545404545554045454455555540454544555554045454444554440454544544454404445 NUMR 3C19DF75949E5DF44C9445F04E9445F69234F3639445FC134F3639445F14D9339FE9445F493381275925EF
It looks instead you have '09'x as the delimiter.
So tell PROC IMPORT that by adding this statement:
delimiter='09'x;
The file is fairly large (5,522,752 observations ) plus now that you have the right delimiter it will probably detect many more columns (aka variables).
But that does seem like a long time. Are you sure that you did not forget the RUN statement? Or have unbalanced quotes?
Do you know what variables you expect to have in the file?
It it will run much faster if you just write your own data step to read the file instead of forcing SAS to guess how to define the variables. Plus you will have more control over how the variables are defined. You can definitely do a much better job at guessing how to define the variables than PROC IMPORT can. It does not have any domain knowledge of what this data is supposed to represent.
@bxg95 wrote:
Yes, the data set has more than 6M.
Yes, I know the variables in the dataset. There are 105 variables. I am sorry, what do you mean by write own data step? Do you mean using INFILE statement?
Yes. Using a data step to read a structured text file is basic SAS technique, any starting SAS programmer needs to learn that.
For the past 20+ years, this has been the bread and butter of my daily work (reading unload files delivered from the databases).
Yes. And DATA statement and other statements to form a full data step.
Something like this. Fill in the definition of all 108 variables in the LENGTH statement. Numeric variables should have length of 8 since SAS stores all numbers as 64-bit floating point numbers. Attach and REQUIRED informats or FORMATS. If you define the variables in the order they appear in the text file then the input can use a positional variable list so you don't have to retype all 108 variable names.
data DICAID_20_1;
infile "D:\Mala\E-UL-Data Request_2000.csv" dsd dlm='09'x truncover firstobs=2;
length var1 8 var2 $20 var3 8 var5 $5 .... var108 8 ;
informat var27 yymmdd.;
format var27 yymmdd10.;
input var1 -- var108;
run;
If you don't have a handy source for the variable names just copy the first line of the text file and paste it into your LENGTH statement and add the lengths for each variable. Here is a simple data step to copy them and paste them into the log.
data _null_;
infile "D:\Mala\E-UL-Data Request_2000.csv" dsd dlm='09'x obs=1 ;
input name :$32. @@;
put name;
run;
Note for most variables SAS does NOT need to have special instructions for either reading from text into values (informat) or writing values into text (format). Especially for CHARACTER variables. Most often you only need to set the informat or format for DATE, TIME or DATETIME values. If there are numbers with commas in them you will need to use the COMMA informat. If you want the numbers displayed with a fixed number of decimal places attach the w.d format to those variables. There is no need for a special informat to read those numbers however. But if the numbers in the text file use an implied decimal place to save space (12 dollars and 34 cents is stored as "1234" instead of "12.34") then use an informat with a decimal specification of the number of decimal places to imply.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.