I am looking to import a text file containing ID and date1, a "," delimiter file. I want to import the date1 as a string that I can modify it later. This is because an id (10008) has two values for the date1 variable with '&'. I want to have both dates in the variable.
I also have the freedom to modify the txt file. Is there any other way I can modify the txt file so that the import is easy ( if there are two dates available, then date1 should show both dates)? Thanks
My Textfile looks like this
ID,date1
10001,2023-03-08
10001,2023-03-13
10001,2023-03-06
10001,2023-03-06
10001,2023-03-06
10001,2023-03-06
10008,2023-03-10 & 2023-03-14
10001,2023-03-13
10001,2023-03-15
10001,2023-02-06
10001,2023-03-08
10001,2023-03-08
100012,2023-03-06
100012,2023-02-28
100012,2023-02-23
This is how I want in data
Regular Proc import won't work
PROC IMPORT OUT= WORK._CUTOFF
DATAFILE= "Desktop\chk.txt"
DBMS=dlm REPLACE;
DELIMITER =',';
GETNAMES=YES;
DATAROW=2;
RUN;
This is a nice showcase to illustrate that one does not use PROC IMPORT for text files.
data cutoff;
infile "Desktop\chk.txt" dlm=',' dsd truncover firstobs=2;
input id $ _date1 :$100.;
format date1 yymmdd10.;
do i = 1 to countw(_date1,'&');
date1 = input(strip(scan(_date1,i,'&')),yymmdd10.);
output;
end;
drop _date1 i;
run;
Now you have nice SAS date values and can deal with the duplicate entry in an orderly manner.
I will place a wager of a short stack of quarters that your example data with the two dates in the file is actually more than 20 rows from the top.
I copied your example data, pasted it into a text file on my computer and ran this code:
PROC IMPORT OUT= WORK._CUTOFF DATAFILE= "&outpath.\example.txt" DBMS=dlm REPLACE; DELIMITER =','; GETNAMES=YES; DATAROW=2; RUN; proc print data=work._cutoff; run;
The generated data step code in my log shows:
419 PROC IMPORT OUT= WORK._CUTOFF 420 DATAFILE= "&outpath.\example.txt" 421 DBMS=dlm REPLACE; 422 DELIMITER =','; 423 GETNAMES=YES; 424 DATAROW=2; 425 RUN; 426 /********************************************************************** 427 * PRODUCT: SAS 428 * VERSION: 9.4 429 * CREATOR: External File Interface 430 * DATE: 25MAR23 431 * DESC: Generated SAS Datastep Code 432 * TEMPLATE SOURCE: (None Specified.) 433 ***********************************************************************/ 434 data WORK._CUTOFF ; 435 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 436 infile 'D:\Users\Owner\Documents\\example.txt' delimiter = ',' MISSOVER DSD lrecl=32767 436! firstobs=2 ; 437 informat ID best32. ; 438 informat date1 $23. ; 439 format ID best12. ; 440 format date1 $23. ; 441 input 442 ID 443 date1 $ 444 ; 445 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ 446 run; NOTE: The infile 'D:\Users\Owner\Documents\\example.txt' is: Filename=D:\Users\Owner\Documents\\example.txt, RECFM=V,LRECL=32767,File Size (bytes)=294, Last Modified=28Mar2023:19:29:31, Create Time=28Mar2023:19:29:04 NOTE: 15 records were read from the infile 'D:\Users\Owner\Documents\\example.txt'. The minimum record length was 16. The maximum record length was 29. NOTE: The data set WORK._CUTOFF has 15 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 15 rows created in WORK._CUTOFF from D:\Users\Owner\Documents\\example.txt.
Which shows that the data was read as 23 character value.
HOWEVER, if I copy lines with single date and paste them until that row with two dates is below line 25 my log looks like
467 data WORK._CUTOFF2 ; 468 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 469 infile 'D:\Users\Owner\Documents\\example2.txt' delimiter = ',' MISSOVER DSD lrecl=32767 469! firstobs=2 ; 470 informat ID best32. ; 471 informat date1 yymmdd10. ; 472 format ID best12. ; 473 format date1 yymmdd10. ; 474 input 475 ID 476 date1 477 ; 478 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ 479 run; NOTE: The infile 'D:\Users\Owner\Documents\\example2.txt' is: Filename=D:\Users\Owner\Documents\\example2.txt, RECFM=V,LRECL=32767,File Size (bytes)=735, Last Modified=28Mar2023:19:35:26,
duplicating your claim.
From the online documentation for Proc Import in the Overview section:
When you run the IMPORT procedure, it reads the input file and writes the data to the specified SAS data set. By default, IMPORT procedure expects the variable names to appear in the first row. The procedure scans the first 20 rows to count the variables, and it attempts to determine the correct informat and format for each variable.
(emphasis added).
You can use the Proc Import statement Guessingrows=MAX; to force the procedure to look at more data before guessing.
Or for simple files like this write your own data step code OR even copy the code generated by proc import, paste into the editor, clean it up and set the properties of the variables you want such a changing the Informat to a longer value and character such as was generated in my first run.
Thank you. 'GUESSINGROWS' option worked for my purpose for now. But I have another question when I try to and another date for 10008 (ID) separated by '&' . Then it only ready one date in the string. Why? is there any way it reads complete string?
Two dates as a string in date1 variable
Proc import
PROC IMPORT OUT= WORK.SDTM_CUTOFF
DATAFILE= "xx\chk.txt"
DBMS=dlm REPLACE;
DELIMITER =',';
GETNAMES=YES;
DATAROW=2;
GUESSINGROWS = MAX;
RUN;
Result:
text file with 3 dates in date1 string
Result:
LOG:
NOTE: The infile 'xx\chk.txt' is:
Filename=xx\chk.txt,
RECFM=V,LRECL=32767,File Size (bytes)=309,
Last Modified=29Mar2023:10:38:52,
Create Time=28Mar2023:21:04:17
NOTE: 15 records were read from the infile 'xx\chk.txt'.
The minimum record length was 16.
The maximum record length was 43.
NOTE: The data set WORK.SDTM_CUTOFF has 15 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
This is a nice showcase to illustrate that one does not use PROC IMPORT for text files.
data cutoff;
infile "Desktop\chk.txt" dlm=',' dsd truncover firstobs=2;
input id $ _date1 :$100.;
format date1 yymmdd10.;
do i = 1 to countw(_date1,'&');
date1 = input(strip(scan(_date1,i,'&')),yymmdd10.);
output;
end;
drop _date1 i;
run;
Now you have nice SAS date values and can deal with the duplicate entry in an orderly manner.
Thank you @Kurt_Bremser. This will help me in the future when I get the scenario. I am never aware of this method. Thanks again. However, My interest, in this case, was I want is read both dates as strings (like in image), which I have to merge with another dataset to make a visual comparison. Is it possible can we to achieve this in another way? @ballardw is kind enough to help me to use the 'GUESSINGROWS' option in import.
If you do not want SAS date values (which I STRONGLY recommend against), simply remove the FORMAT statement, the whole DO loop, and the DROP.
I agree with your opinion. This is just for display purposes, so I was looking for something to display like in the image which later I export to excel to display the dates for that particular ID (10008). 'Guessingrows' during import worked for me when I have the two dates but as I increase the length with more dates, I am having trouble pulling all the dates. Thank you for your suggestions
Did you run my code as suggested? I can only repeat, one does not use PROC IMPORT for text files. Its guessing will only cause trouble.
Got it. I did run your code. I am looking for not to split the one record of Id(10008) into three records ( I just added an extra date to the previous text file to check).
My code without the DO loop will not split the string, so where is the issue?
My sincere apologies. Yes, your code worked, I was reading the wrong file into the code. It worked with out the 'DO LOOP' Thanks again.
PROC IMPORT with GUESSINGROWS=MAX will work fine for as many dates as you want in that second column
As long as:
1) They are all part of that second column. So no commas, unless the whole list of dates is quoted. No extra line breaks. etc.
2) The length of the longest list of dates is less than 32767 bytes long. (Actually a little less since the first column will use some of the bytes). A SAS variable cannot contain more than 32.767 bytes. And the default line length for a text file is 32,767 bytes.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.