Solved: Re: Import 'DATES' as string in TXT file

SASuserlot · Posted 03-28-2023 09:16 PM

I am looking to import a text file containing ID and date1, a "," delimiter file. I want to import the date1 as a string that I can modify it later. This is because an id (10008) has two values for the date1 variable with '&'. I want to have both dates in the variable.

I also have the freedom to modify the txt file. Is there any other way I can modify the txt file so that the import is easy ( if there are two dates available, then date1 should show both dates)? Thanks

My Textfile looks like this

ID,date1
10001,2023-03-08
10001,2023-03-13
10001,2023-03-06
10001,2023-03-06
10001,2023-03-06
10001,2023-03-06
10008,2023-03-10 & 2023-03-14
10001,2023-03-13
10001,2023-03-15
10001,2023-02-06
10001,2023-03-08
10001,2023-03-08
100012,2023-03-06
100012,2023-02-28
100012,2023-02-23

This is how I want in data

Regular Proc import won't work

PROC IMPORT OUT= WORK._CUTOFF
DATAFILE= "Desktop\chk.txt"
DBMS=dlm REPLACE;
DELIMITER =',';
GETNAMES=YES;
DATAROW=2;
RUN;

Kurt_Bremser · Posted 03-29-2023 03:49 AM

This is a nice showcase to illustrate that one does not use PROC IMPORT for text files.

data cutoff;
infile "Desktop\chk.txt" dlm=',' dsd truncover firstobs=2;
input id $ _date1 :$100.;
format date1 yymmdd10.;
do i = 1 to countw(_date1,'&');
  date1 = input(strip(scan(_date1,i,'&')),yymmdd10.);
  output;
end;
drop _date1 i;
run;

Now you have nice SAS date values and can deal with the duplicate entry in an orderly manner.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

ballardw · Posted 03-28-2023 09:43 PM

I will place a wager of a short stack of quarters that your example data with the two dates in the file is actually more than 20 rows from the top.

I copied your example data, pasted it into a text file on my computer and ran this code:

PROC IMPORT OUT= WORK._CUTOFF
DATAFILE= "&outpath.\example.txt"
DBMS=dlm REPLACE;
DELIMITER =',';
GETNAMES=YES;
DATAROW=2;
RUN;

proc print data=work._cutoff;
run;

The generated data step code in my log shows:

419  PROC IMPORT OUT= WORK._CUTOFF
420  DATAFILE= "&outpath.\example.txt"
421  DBMS=dlm REPLACE;
422  DELIMITER =',';
423  GETNAMES=YES;
424  DATAROW=2;
425  RUN;

426   /**********************************************************************
427   *   PRODUCT:   SAS
428   *   VERSION:   9.4
429   *   CREATOR:   External File Interface
430   *   DATE:      25MAR23
431   *   DESC:      Generated SAS Datastep Code
432   *   TEMPLATE SOURCE:  (None Specified.)
433   ***********************************************************************/
434      data WORK._CUTOFF    ;
435      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
436      infile 'D:\Users\Owner\Documents\\example.txt' delimiter = ',' MISSOVER DSD lrecl=32767
436! firstobs=2 ;
437         informat ID best32. ;
438         informat date1 $23. ;
439         format ID best12. ;
440         format date1 $23. ;
441      input
442                  ID
443                  date1 $
444      ;
445      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
446      run;

NOTE: The infile 'D:\Users\Owner\Documents\\example.txt' is:
      Filename=D:\Users\Owner\Documents\\example.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=294,
      Last Modified=28Mar2023:19:29:31,
      Create Time=28Mar2023:19:29:04

NOTE: 15 records were read from the infile 'D:\Users\Owner\Documents\\example.txt'.
      The minimum record length was 16.
      The maximum record length was 29.
NOTE: The data set WORK._CUTOFF has 15 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


15 rows created in WORK._CUTOFF from D:\Users\Owner\Documents\\example.txt.

Which shows that the data was read as 23 character value.

HOWEVER, if I copy lines with single date and paste them until that row with two dates is below line 25 my log looks like

467      data WORK._CUTOFF2    ;
468      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
469      infile 'D:\Users\Owner\Documents\\example2.txt' delimiter = ',' MISSOVER DSD lrecl=32767
469! firstobs=2 ;
470         informat ID best32. ;
471         informat date1 yymmdd10. ;
472         format ID best12. ;
473         format date1 yymmdd10. ;
474      input
475                  ID
476                  date1
477      ;
478      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
479      run;

NOTE: The infile 'D:\Users\Owner\Documents\\example2.txt' is:
      Filename=D:\Users\Owner\Documents\\example2.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=735,
      Last Modified=28Mar2023:19:35:26,

duplicating your claim.

From the online documentation for Proc Import in the Overview section:

When you run the IMPORT procedure, it reads the input file and writes the data to the specified SAS data set. By default, IMPORT procedure expects the variable names to appear in the first row. The procedure scans the first 20 rows to count the variables, and it attempts to determine the correct informat and format for each variable.

(emphasis added).

You can use the Proc Import statement Guessingrows=MAX; to force the procedure to look at more data before guessing.

Or for simple files like this write your own data step code OR even copy the code generated by proc import, paste into the editor, clean it up and set the properties of the variables you want such a changing the Informat to a longer value and character such as was generated in my first run.

SASuserlot · Posted 03-29-2023 10:42 AM

Thank you. 'GUESSINGROWS' option worked for my purpose for now. But I have another question when I try to and another date for 10008 (ID) separated by '&' . Then it only ready one date in the string. Why? is there any way it reads complete string?

Two dates as a string in date1 variable

Proc import

PROC IMPORT OUT= WORK.SDTM_CUTOFF 
            DATAFILE= "xx\chk.txt" 
            DBMS=dlm REPLACE;
			DELIMITER =',';
     GETNAMES=YES;
     DATAROW=2; 
	 GUESSINGROWS = MAX;
RUN;

Result:

text file with 3 dates in date1 string

Result:

LOG:


NOTE: The infile 'xx\chk.txt' is:
      Filename=xx\chk.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=309,
      Last Modified=29Mar2023:10:38:52,
      Create Time=28Mar2023:21:04:17

NOTE: 15 records were read from the infile 'xx\chk.txt'.
      The minimum record length was 16.
      The maximum record length was 43.
NOTE: The data set WORK.SDTM_CUTOFF has 15 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

Kurt_Bremser · Posted 03-29-2023 03:49 AM

This is a nice showcase to illustrate that one does not use PROC IMPORT for text files.

data cutoff;
infile "Desktop\chk.txt" dlm=',' dsd truncover firstobs=2;
input id $ _date1 :$100.;
format date1 yymmdd10.;
do i = 1 to countw(_date1,'&');
  date1 = input(strip(scan(_date1,i,'&')),yymmdd10.);
  output;
end;
drop _date1 i;
run;

Now you have nice SAS date values and can deal with the duplicate entry in an orderly manner.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SASuserlot · Posted 03-29-2023 10:19 AM

Thank you @Kurt_Bremser. This will help me in the future when I get the scenario. I am never aware of this method. Thanks again. However, My interest, in this case, was I want is read both dates as strings (like in image), which I have to merge with another dataset to make a visual comparison. Is it possible can we to achieve this in another way? @ballardw is kind enough to help me to use the 'GUESSINGROWS' option in import.

Kurt_Bremser · Posted 03-29-2023 10:36 AM

If you do not want SAS date values (which I STRONGLY recommend against), simply remove the FORMAT statement, the whole DO loop, and the DROP.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SASuserlot · Posted 03-29-2023 10:48 AM

I agree with your opinion. This is just for display purposes, so I was looking for something to display like in the image which later I export to excel to display the dates for that particular ID (10008). 'Guessingrows' during import worked for me when I have the two dates but as I increase the length with more dates, I am having trouble pulling all the dates. Thank you for your suggestions

Kurt_Bremser · Posted 03-29-2023 10:59 AM

Did you run my code as suggested? I can only repeat, one does not use PROC IMPORT for text files. Its guessing will only cause trouble.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SASuserlot · Posted 03-29-2023 11:22 AM

Got it. I did run your code. I am looking for not to split the one record of Id(10008) into three records ( I just added an extra date to the previous text file to check).

Kurt_Bremser · Posted 03-29-2023 11:30 AM

My code without the DO loop will not split the string, so where is the issue?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SASuserlot · Posted 03-29-2023 11:40 AM

My sincere apologies. Yes, your code worked, I was reading the wrong file into the code. It worked with out the 'DO LOOP' Thanks again.

Tom · Posted 03-29-2023 11:30 AM

PROC IMPORT with GUESSINGROWS=MAX will work fine for as many dates as you want in that second column

As long as:

1) They are all part of that second column. So no commas, unless the whole list of dates is quoted. No extra line breaks. etc.

2) The length of the longest list of dates is less than 32767 bytes long. (Actually a little less since the first column will use some of the bytes). A SAS variable cannot contain more than 32.767 bytes. And the default line length for a text file is 32,767 bytes.

Catch up on SAS Innovate 2026