Help using Base SAS procedures

Import Files in Sequence Based on File Name

Accepted Solution Solved
Reply
Regular Contributor
Posts: 164
Accepted Solution

Import Files in Sequence Based on File Name

Hello I am using SAS 9.3.  I am importing .txt files where one line of data is a single patient abstract.  There are submission files which is the original submission of the abstract data and then correction files which include the entire row of data for the patient abstract with one or more fields with new corrected values.  What I've been doing is uploading the submission files, then uploading all the correction files and updating the submission dataset with corrections.  This all made sense to me and is working fine, except it dawned on me that a single abstract could have more than one correction which may or may not be the same field being corrected in each correction file.  So what I think I should be doing is applying each correction file in sequential order to update the submission file so that if the diagnosis, for example, is changed in correction file 1 but then changed again in correction file 3, that it's file 3 that is the final update.  Each correction file name has the 5 digits being a number where "1" is the first correction file i.e. CA001.  How would I loop through each correction file to update the submission file in sequence?  Thanks.


Accepted Solutions
Solution
‎06-26-2014 11:16 PM
Super Contributor
Posts: 297

Re: Import Files in Sequence Based on File Name

Why do you assume that he is a she?

Very cool method though, I didn't know you could do it all in one mass update.  It is great to learn something new.

Given that is the case, I am a fan of the FILEVAR method.

%LET IMPORT_FILE_SUB =  'E:\TEST\DATA\SUBMISSION\*.TXT';

%LET IMPORT_FILE_CORR = 'E:\TEST\DATA\CORRECTIONS\*.TXT';

DATA IMPORTA; INFILE &IMPORT_FILE_SUB. TRUNCOVER LRECL = 5000 FIRSTOBS=1;

INPUT @1 CHART $6.  @7 ACCT $6.  @13 SEX $1.  @14 AGE $2.  @16 DATAA $2.  @18 DATAB $2.  @20 DATAC $2.;

RUN;

FILENAME FILES PIPE "DIR E:\TEST\DATA\CORRECTIONS\*.TXT /B /O:N";

DATA HAVE;

    INFILE FILES TRUNCOVER END=LAST;

    INPUT FNAME $200.;

  FNAME = "E:\TEST\DATA\CORRECTIONS\"||TRIM(LEFT(FNAME));

RUN;

DATA CORRECTIONS;

SET HAVE;

  INFILE IN FILEVAR = FNAME END = EOF

     LRECL=21

     TRUNCOVER

  ;

DO UNTIL (EOF);

INPUT    @1 CHART $6.  @7 ACCT $6.  @13 SEX $1.  @14 AGE $2.  @16 DATAA $2.  @18 DATAB $2.  @20 DATAC $2.;

OUTPUT;

END;

RUN;

PROC SORT DATA = IMPORTA;

BY CHART ACCT;

RUN;

PROC SORT DATA = CORRECTIONS;

BY CHART ACCT;

RUN;

DATA WANT;

UPDATE IMPORTA CORRECTIONS;

BY CHART ACCT;

RUN;

To answer the problem you were having with your macro code , you were updating the importa file on every iteration and overwriting the test dataset, when you should have been updating the test dataset.

proc sort data=importa;

by chart acct; run;

proc sort data=nxg1;

by chart acct; run;

data test;

update importa nxg1;

by chart acct; run;

View solution in original post


All Replies
Super User
Posts: 19,815

Re: Import Files in Sequence Based on File Name

How are you currently looping through the files?

If you're using a pipe to control the read of files then I'd suggest ordering the files in order via sort before importing.

Regular Contributor
Posts: 164

Re: Import Files in Sequence Based on File Name

Hi Reeza Thanks so much for responding. Right now, I'm just importing all submission files to save as file1.  Then I import all correction files and save as file2.  Then I update file1 with file2.  So no looping or anything but as per my scenario above, this could mean that the most recent update isn't being applied OR one correction is updating one data element and the next correction is updating the other but that may not be occurring so I need to have each correction file update file1 and then keep applying each correction file in sequence so the records are updated correctly. Any assistance you can suggest is greatly appreciated.

Super User
Posts: 19,815

Re: Import Files in Sequence Based on File Name

You say, save as file2? How are you doing this? Is it a dataset SET or APPEND?

My suggestion would be to use the indsname option in your SET statement and sort by the filename, as long as they're sorted they should be fine.

I would test this though Smiley Happy

According to the docs:

  • Each observation in the master data set should have a unique value of the BY variable or BY variables. If there are multiple values for the BY variable, only the first observation with that value is updated. The transaction data set can contain more than one observation with the same BY value. (Multiple transaction observations are all applied to the master observation before it is written to the output file.)
Super Contributor
Posts: 297

Re: Import Files in Sequence Based on File Name

I have the following code that you will need to customize to work in your exact scenario.

filename files pipe "dir \\XXXXX\YYYYYY\CA*.txt /b /o:N";   ***Dir is a windows command, you will need to alter this if you are on an alternate OS, as well as changing to the appropriate file path ;

data _null_;

    infile files truncover end=last;

    input fname $200.;

    i+1;

    call symput("fname"||trim(left(put(i,8.))),trim(fname));

    if last then call symput("total",trim(left(put(i,8.))));

    call symput("fnamedate"||trim(left(put(i,8.))),substr(trim(fname),15,8));

run;

%macro imports;

    %do i=1 %to &total.;

data nxg1;

     infile " \\XXXXX\YYYYYY\&&fname&i" /*change to the path where the files are contained*/

           delimiter = "09"x /* you may need to change the delimiter*/;

           missover

           dsd

           lrecl=32767

           firstobs=1

     ;

/*You will need to customize the following;*/

     length

     ;

     format

    ;

     input

    ;

run;


***Change this from an append to an update;

proc append base = nxg data = nxg1;

run;

%end;

%MEND;

%IMPORTS;

This code isn't tested, but should get you started.

Regards,

Scott

Regular Contributor
Posts: 164

Re: Import Files in Sequence Based on File Name

Posted in reply to Scott_Mitchell

Hi Scott

Sorry but just getting around to this again.  I'm getting an error message of "The system cannot find the path specified".  The code that I've used so far to produce the error is:

filename files pipe 'F:\TEST\Data\Corrections\*.txt'; data _null_;

infile files truncover end=last;

input fname $10.;     i+1;

call symput("fname"||trim(left(put(i,8.))),trim(fname));

if last then call symput("total",trim(left(put(i,8.))));

call symput("fnamedate"||trim(left(put(i,8.))),

substr(trim(fname),15,8)); run;

What am I doing wrong?  Thanks.

EDIT:  I found the issue, I wasn't including the directory part. So the syntax for the first line for windows should read:

files pipe 'dir /b "F:\TEST\Data\Corrections\*.txt"';

Regular Contributor
Posts: 164

Re: Import Files in Sequence Based on File Name

Hi Okay, now I'm having difficulty with the lines to get the data.  You indicate that the first lines should be: %macro imports;

I believe there is an error in that the delimiter line has a ; as

%do i=1 %to &total.;

data nxg1;

infile " \\XXXXX\YYYYYY\&&fname&i"

delimiter = "09"x ;

missover

dsd

lrecl=32767

firstobs=1;  well as after firstobs but I corrected that in mine and changed the infile location. But it still appears that something isn't closed because the font isn't changing for the run statement, and in fact nothing happens when I run it. My code is as follows:

filename files pipe 'dir /b "F:\TEST\Data\Corrections\*.txt"'; data _null_;

infile files truncover end=last;

input fname $10.;   

i+1;   

call symput("fname"||trim(left(put(i,8.))),trim(fname));  

if last then call symput("total",trim(left(put(i,8.))));

call symput("fnamedate"||trim(left(put(i,8.))),substr(trim(fname),15,8)); run;

%macro imports;   

%do i=1 %to &total.; data nxg1;

infile "F:\TEST\Data\Corrections\&&fname&i" truncover LRECL = 5000 firstobs=2;

input    @1 Chart $6.   @7 Acct $6.   @13 Sex $1.   @14 Age $2.   @16 DataA $2.   @18 DataB $2.   @20 DataC $2.; run;

Any assistance is greatly appreciated.

Super Contributor
Posts: 297

Re: Import Files in Sequence Based on File Name

Hi Shellp55,

I don't know why you were experiencing issues with the ';' missing as they appear in my code and it works perfectly. 

Anything that is contained within the %Macro <MACRONAME>; and <MEND>; is not color coded.  If you want to retain the color coding then use %LET A = %NRSTR(%MENDSmiley Wink; immediately after %Macro <MACRONAME>;  This tricks the editor into observing the %MEND; without actually resolving it and doesn't effect the outcome of your code. I only really use it when I have a very complex macro during debugging.

Does this help?

Contributor
Posts: 46

Re: Import Files in Sequence Based on File Name

Posted in reply to Scott_Mitchell

U can import more than two raw data (TXT) files at a time.

Usung File Name

Ex:- u have 10 text files in the same folder but different file Names like

1.File_1,File_2,  File_3, ....................

filename  any_name ("File_Path\File_*.txt");

data  Final_Dataset;

infile any_name    dlimiter = '|' DSD missover lrecl=32767 firstobs=1;

informat var_1 $10.;

informat   var_2 $10.;

informat var_3 $10.;

format var_1 $10.;

format var_2 $10.;

format var_3 $10.;

input

Var_1 $

Var_2 $

Var_3 $

;

run;

Super User
Posts: 10,028

Re: Import Files in Sequence Based on File Name

I suggest to us FILEVAR= option of INFILE , which could give you more flexible way to handle this kind of thing.

Xia Keshan

Regular Contributor
Posts: 164

Re: Import Files in Sequence Based on File Name

Hi Thanks Xia...sorry but I'm not really good at this so not sure what you mean. Scott, I entered all the code and I see that even though the colours don't change, it's still accepting the code.  BUT, I get an error on the line citing the filename:

ERROR: Physical file does not exist, F:\TEST\Data\Corrections\&&fname&i.

For the purpose of testing, each file name is only 5 characters:  Corr1, Corr2, and Corr3.  The entire code that is producing the error is:

data nxg1;

infile 'F:\TEST\Data\Corrections\&&fname&i' truncover LRECL = 5000 firstobs=2;

input   @1 Chart $6. 

           @7 Acct $6. 

           @13 Sex $1. 

           @14 Age $2. 

           @16 DataA $2. 

           @18 DataB $2. 

          @20 DataC $2.; run;

What do you think the issue is?  Thanks.

Regular Contributor
Posts: 164

Re: Import Files in Sequence Based on File Name

Hi Apparently my brain works better when I ask questions...but in future I'll ask them privately before posting so I'm not wasting space. So I figured out the issue...I needed to put double quotation marks around the infile filename.  But it still wasn't working and I suspected it wasn't accepting the file as .txt and since there are no delimiters because it's a flat file I couldn't code that.  So I concatenated ".txt" to the fname in the macro i.e. call symput("fname"||trim(left(put(i,8.))),trim(fname)||".txt"); and now it works wonderfully.  At least in test it does. This works great for when I have distinct correction files but I have another format where there is multiple files submitted per year and within the file is a code for "new", "deleted", "updated".  So an abstract could be "new" in April and then resubmitted in July with a correction so the field is "updated".  So I'll need each file to import in sequence and add, update or delete in sequence. I'll see if I can do it based on this exercise but will start a new thread if I do.  Thanks everyone!

Regular Contributor
Posts: 164

Re: Import Files in Sequence Based on File Name

Hey I spoke too soon.  The code is working but it isn't doing as I require.  In this test scenario I have 4 abstracts.  I also have 4 correction files, each with one abstract each.  For two of the correction files I used the same abstract but changed the data.  What I wanted to have happen was that the corrections would be applied in sequence, updating the original in sequence.  So if correction one changed the age in abstract 1 from 50 to 60 and then another file changed the age in abstract 1 to 74, I wanted 74 to be the final value for abstract 1. What the current code is doing is that it only seems to be keeping the final correction file and updating the master file with it and ignoring the corrections from the other correction files.  Please advise - thanks.

Super Contributor
Posts: 297

Re: Import Files in Sequence Based on File Name

Hi Shell,

It appears that you deleted "/o:N" from the dir windows command.  This keeps the files ordered by Name.  I gather that your file names have an a value that designates their order.

Adding the /o:N to your code show resolve the order in which they are processed.

Regular Contributor
Posts: 164

Re: Import Files in Sequence Based on File Name

Posted in reply to Scott_Mitchell

Hi Scott

Thanks for the speedy response.  I added the notation that you indicated to the code and it still doesn't work as required.  The full code is:

%let import_file_sub = 'F:\TEST\Data\Submission\*.txt';

%let import_file_corr='F:\TEST\Data\Corrections\*.txt';

data importA; infile &import_file_sub truncover LRECL = 5000 firstobs=2;

input @1 Chart $6.  @7 Acct $6.  @13 Sex $1.  @14 Age $2.  @16 DataA $2.  @18 DataB $2.  @20 DataC $2.; run;

filename files pipe 'dir /b "F:\TEST\Data\Corrections\*.txt" /o:N'; data _null_;

infile files truncover end=last;

input fname $5.;

i+1;

call symput("fname"||trim(left(put(i,8.))),trim(fname)||".txt");

if last then call symput("total",trim(left(put(i,8.))));

call symput("fnamedate"||trim(left(put(i,8.))),substr(trim(fname),15,8)); run;

%macro imports;

%do i=1 %to &total.;

data nxg1;

infile "F:\TEST\Data\Corrections\&&fname&i" dlm='1C'x truncover LRECL = 5000 firstobs=2;

input    @1 Chart $6.  @7 Acct $6.  @13 Sex $1.  @14 Age $2.  @16 DataA $2.  @18 DataB $2.  @20 DataC $2.; run;

proc sort data=importa;

by chart acct; run;

proc sort data=nxg1;

by chart acct; run;

data test;

update importa nxg1;

by chart acct; run;

%end;

%MEND;

%IMPORTS;

As per the last notation, I wish each file to append to the main file in sequence so that if an abstract is edited more than once from the original, then the edits will be applied correctly.  In this test program, there are only a few fields but the real data has hundreds of fields and I won't know which ones have been edited to indicate that in the program.  So it just has to overwrite the original EACH time.  Any assistance greatly appreciated.  Thanks.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 26 replies
  • 641 views
  • 7 likes
  • 5 in conversation