BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SASuserlot
Barite | Level 11

I am looking to import a text file containing ID and date1, a "," delimiter file.  I want to import the date1 as a string that I can modify it later. This is because an id (10008) has two values for the date1 variable with '&'. I want to have both dates in the variable. 

I also have the freedom to modify the txt file. Is there any other way I can modify the txt file so that the import is easy ( if there are two dates available, then date1 should show both dates)? Thanks

 

My Textfile looks like this

ID,date1
10001,2023-03-08
10001,2023-03-13
10001,2023-03-06
10001,2023-03-06
10001,2023-03-06
10001,2023-03-06
10008,2023-03-10 & 2023-03-14
10001,2023-03-13
10001,2023-03-15
10001,2023-02-06
10001,2023-03-08
10001,2023-03-08
100012,2023-03-06
100012,2023-02-28
100012,2023-02-23

This is how I want in data

SASuserlot_0-1680052561378.png

 Regular Proc import won't work 

PROC IMPORT OUT= WORK._CUTOFF
DATAFILE= "Desktop\chk.txt"
DBMS=dlm REPLACE;
DELIMITER =',';
GETNAMES=YES;
DATAROW=2;
RUN;

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

This is a nice showcase to illustrate that one does not use PROC IMPORT for text files.

data cutoff;
infile "Desktop\chk.txt" dlm=',' dsd truncover firstobs=2;
input id $ _date1 :$100.;
format date1 yymmdd10.;
do i = 1 to countw(_date1,'&');
  date1 = input(strip(scan(_date1,i,'&')),yymmdd10.);
  output;
end;
drop _date1 i;
run;

Now you have nice SAS date values and can deal with the duplicate entry in an orderly manner.

View solution in original post

11 REPLIES 11
ballardw
Super User

I will place a wager of a short stack of quarters that your example data with the two dates in the file is actually more than 20 rows from the top.

I copied your example data, pasted it into a text file on my computer and ran this code:

 

PROC IMPORT OUT= WORK._CUTOFF
DATAFILE= "&outpath.\example.txt"
DBMS=dlm REPLACE;
DELIMITER =',';
GETNAMES=YES;
DATAROW=2;
RUN;

proc print data=work._cutoff;
run;

The generated data step code in my log shows:

419  PROC IMPORT OUT= WORK._CUTOFF
420  DATAFILE= "&outpath.\example.txt"
421  DBMS=dlm REPLACE;
422  DELIMITER =',';
423  GETNAMES=YES;
424  DATAROW=2;
425  RUN;

426   /**********************************************************************
427   *   PRODUCT:   SAS
428   *   VERSION:   9.4
429   *   CREATOR:   External File Interface
430   *   DATE:      25MAR23
431   *   DESC:      Generated SAS Datastep Code
432   *   TEMPLATE SOURCE:  (None Specified.)
433   ***********************************************************************/
434      data WORK._CUTOFF    ;
435      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
436      infile 'D:\Users\Owner\Documents\\example.txt' delimiter = ',' MISSOVER DSD lrecl=32767
436! firstobs=2 ;
437         informat ID best32. ;
438         informat date1 $23. ;
439         format ID best12. ;
440         format date1 $23. ;
441      input
442                  ID
443                  date1 $
444      ;
445      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
446      run;

NOTE: The infile 'D:\Users\Owner\Documents\\example.txt' is:
      Filename=D:\Users\Owner\Documents\\example.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=294,
      Last Modified=28Mar2023:19:29:31,
      Create Time=28Mar2023:19:29:04

NOTE: 15 records were read from the infile 'D:\Users\Owner\Documents\\example.txt'.
      The minimum record length was 16.
      The maximum record length was 29.
NOTE: The data set WORK._CUTOFF has 15 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


15 rows created in WORK._CUTOFF from D:\Users\Owner\Documents\\example.txt.


Which shows that the data was read as 23 character value.

HOWEVER, if I copy lines with single date and paste them until that row with two dates is below line 25 my log looks like

467      data WORK._CUTOFF2    ;
468      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
469      infile 'D:\Users\Owner\Documents\\example2.txt' delimiter = ',' MISSOVER DSD lrecl=32767
469! firstobs=2 ;
470         informat ID best32. ;
471         informat date1 yymmdd10. ;
472         format ID best12. ;
473         format date1 yymmdd10. ;
474      input
475                  ID
476                  date1
477      ;
478      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
479      run;

NOTE: The infile 'D:\Users\Owner\Documents\\example2.txt' is:
      Filename=D:\Users\Owner\Documents\\example2.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=735,
      Last Modified=28Mar2023:19:35:26,

duplicating your claim.

From the online documentation for Proc Import in the  Overview section:

When you run the IMPORT procedure, it reads the input file and writes the data to the specified SAS data set. By default, IMPORT procedure expects the variable names to appear in the first row. The procedure scans the first 20 rows to count the variables, and it attempts to determine the correct informat and format for each variable.

(emphasis added).

 

You can use the Proc Import statement Guessingrows=MAX; to force the procedure to look at more data before guessing.

Or for simple files like this write your own data step code OR even copy the code generated by proc import, paste into the editor, clean it up and set the properties of the variables you want such a changing the Informat to a longer value and character such as was generated in my first run.

 

 

 

 

 

SASuserlot
Barite | Level 11

Thank you. 'GUESSINGROWS' option worked for my purpose for now. But I have another question when I try to and another date for 10008 (ID) separated by '&' . Then it only ready one date in the string. Why? is there any way it reads complete string?

 

Two dates as a string  in date1 variable

SASuserlot_0-1680100615795.png

Proc import

PROC IMPORT OUT= WORK.SDTM_CUTOFF 
            DATAFILE= "xx\chk.txt" 
            DBMS=dlm REPLACE;
			DELIMITER =',';
     GETNAMES=YES;
     DATAROW=2; 
	 GUESSINGROWS = MAX;
RUN;

Result:

SASuserlot_1-1680100680939.png

text file with 3 dates in date1 string

SASuserlot_2-1680100799905.png

Result:

SASuserlot_3-1680100835871.png

LOG:


NOTE: The infile 'xx\chk.txt' is:
      Filename=xx\chk.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=309,
      Last Modified=29Mar2023:10:38:52,
      Create Time=28Mar2023:21:04:17

NOTE: 15 records were read from the infile 'xx\chk.txt'.
      The minimum record length was 16.
      The maximum record length was 43.
NOTE: The data set WORK.SDTM_CUTOFF has 15 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

 

 

Kurt_Bremser
Super User

This is a nice showcase to illustrate that one does not use PROC IMPORT for text files.

data cutoff;
infile "Desktop\chk.txt" dlm=',' dsd truncover firstobs=2;
input id $ _date1 :$100.;
format date1 yymmdd10.;
do i = 1 to countw(_date1,'&');
  date1 = input(strip(scan(_date1,i,'&')),yymmdd10.);
  output;
end;
drop _date1 i;
run;

Now you have nice SAS date values and can deal with the duplicate entry in an orderly manner.

SASuserlot
Barite | Level 11

 Thank you @Kurt_Bremser. This will help me in the future when I get the scenario. I am never aware of this method. Thanks again. However,  My interest, in this case, was I want is read both dates as strings (like in image), which I have to merge with another dataset to make a visual comparison. Is it possible can we to achieve this in another way?  @ballardw  is kind enough to help me to use the 'GUESSINGROWS' option in import. 

SASuserlot_0-1680099578893.png

 

SASuserlot
Barite | Level 11

I agree with your opinion. This is just for display purposes, so I was looking for something to display like in the image which later I export to excel to display the dates for that particular ID (10008). 'Guessingrows' during import worked for me when I have the two dates but as I increase the length with more dates, I am having trouble pulling all the dates. Thank you for your suggestions

SASuserlot_0-1680101192722.png

 

SASuserlot
Barite | Level 11

Got it. I did run your code. I am looking for not to split the one record of Id(10008) into three records ( I just added an extra date to the previous text file to check).

SASuserlot_0-1680103339577.png

 

SASuserlot
Barite | Level 11

My sincere apologies. Yes, your code worked, I was reading the wrong file into the code. It worked with out the 'DO LOOP' Thanks again.

Tom
Super User Tom
Super User

PROC IMPORT with GUESSINGROWS=MAX will work fine for as many dates as you want in that second column

 

As long as:

1) They are all part of that second column.  So no commas, unless the whole list of dates is quoted.  No extra line breaks. etc.

2) The length of the longest list of dates is less than 32767 bytes long. (Actually a little less since the first column will use some of the bytes).  A SAS variable cannot contain more than 32.767 bytes. And the default line length for a text file is 32,767 bytes.

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1436 views
  • 6 likes
  • 4 in conversation