Split Input file based on Identifier and dynamically allocate them

Reply
Contributor
Posts: 23

Split Input file based on Identifier and dynamically allocate them

Hi

I am working on a requirement (in Z/OS platform) where I have an input file in the following format.

IDValue
001aaa
001

bbb

002abc
003hfg
002agf
001atr

By reading the above input file, I have to dynamically allocate no. of file = distinct ID's (in this case 3 - 001, 002, 003) with the following functionality.

1. Allocate file names with ID value as a part of their name (e.g. HQLF.S001.FILE) where 001 is a distinct ID value.

2. Write all the records from above input file with ID=001 in the file allocated above.

Points to consider

1. The no. of distinct ID value can change (it can be 5 or 10 output files)

2. No. of records for each distinct ID is not fixed.

Is there a way to achieve this? If so, can you please guide me - at least a high level idea first so that I can try out something? Thanks in advance for the help.

What I tried

I tried reading the input file and got the count and list of Distinct ID values.

Then by reading the the above list - tried to dynamically allocate file with filename containing the ID with the use of FILENAME DYNAMDD option but was not able to.

Super Contributor
Posts: 644

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to prasanna_sk

Assuming there is a single input file and you have successfully read it into a SAS dataset and analysed it to determine the id values, you should be able to dynamically allocate filenames using the filename function.

http://support.sas.com/documentation/cdl/en/hosto390/64786/HTML/default/viewer.htm#p0agwv4mwi9x6xn17...

You could do this in a SAS data step using the list of ids.  You would need to generate a unique filename for each id, e.g.

=cats('FILE', ID) ;

also a fully referenced physical path and name for the file, e.g.

=cats(<filepath and file name prefix>,id,<rest of filename>) ;

and a list of all host options you require.

Then for each id

rc = FILENAME(fileref,filename<,device <,host-options> >) ;

You can test the return code (rc = 0) for success, and if not use other functions to print warning or error messages.

Having allocated filenames successfully, you can write to the file of your choice using the id in a data _NULL_ step:

Select (ID) ;

    When ('001') then file FILE001 ;

  etc etc ;   /* include all possible ids, even if not found in a given input file */

  Otherwise _ERROR_ "File not found to match ID" ;

End;

Richard

Contributor
Posts: 23

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to RichardinOz

Thanks. I was trying something similar to this.

For each ID (instead a Data step - with list of unique ID's) I tried to assign FILENAME DYNAMDD "filename" <other arguments> - It kept throwing some error. I am trying and will let you know if I succeed. Thanks for the reply

Super Contributor
Posts: 644

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to prasanna_sk

"Some error"?  What error?  Please show source and error messages from the log.

DYNAMDD does not show up in a search of SAS documentation.  Are you trying to use a JCL or TSO option with a SAS filename statement? 

Have you tried the data step approach? 

Richard

Contributor
Posts: 23

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to RichardinOz

Sorry - I wanted to try all I can do first before asking for help.

Here is what I tried.

I have used FILENAME DYNAMDD before without any issues. I use it to allocate files dynamically through my JCL. But, I have only used in outside DATA step and referred to DYNAMDD for writing as output. I have not used CALL EXECUTE also so far.

%MACRO WRITE;                     

                                  

FILE DYNAMDD;                    

SET TEST;                     

IF INID=&ID1 THEN DO;            

    PUT @1 INID $CHAR5.;          

    END;

                          

%MEND;    

DATA TEST;                             

  INPUT I1ID @1 INID $CHAR5.;          

DATALINES;                             

00415                                  

00419                                  

00071                                  

00017                                  

00415                                  

00419                                  

00322                                  

00071                                  

;          

PROC SQL;                                                             

  CREATE TABLE IDLIST AS                                               

    SELECT DISTINCT INID AS ID FROM TEST;                           

                                                                      

OPTIONS SYMBOLGEN MACROGEN;                                           

DATA _NULL_;                                                          

  LENGTH FNAME $44.;                                                  

    SET IDLIST END=EOF;BY ID;                                       

    ID1=ID;                                                         

    IDQ='ND' || SUPP;                                                 

    CALL SYMPUT('ID1',ID);                                          

    CALL SYMPUT('IDQ',IDQ);                                           

                                                                      

    CALL EXECUTE('FILENAME DYNAMDD "TST.MDL.&IDQ..SPLIT"          

                  DISP=(NEW,CATLG) UNIT=SYSDA SPACE=(27998,(1,1),RLSE)

                  LRECL=5 RECFM=FB BLKSIZE=0 RETPD=180;');            

                                                                      

    CALL EXECUTE('%WRITE');                                           

                                                                      

RUN;                            



Saslog

So - I am able to allocate the files as I need (one for different ID's) - but not able to write the matching records into it yet.

SYMBOLGEN: Macro variable IDQ resolved to ND00017                              

SYMBOLGEN: Macro variable ID1 resolved to 00017                                

SYMBOLGEN: Macro variable IDQ resolved to ND00071                              

SYMBOLGEN: Macro variable ID1 resolved to 00071                                

SYMBOLGEN: Macro variable IDQ resolved to ND00322                              

SYMBOLGEN: Macro variable ID1 resolved to 00322                                

SYMBOLGEN: Macro variable IDQ resolved to ND00415                              

SYMBOLGEN: Macro variable ID1 resolved to 00415                                

SYMBOLGEN: Macro variable IDQ resolved to ND00419                              

SYMBOLGEN: Macro variable ID1 resolved to 00419                                

NOTE: 5 observations were read from "WORK.IDLIST"                              

NOTE: The data step took :                                                     

      real time : 0.023                                                        

      cpu time  : 0.007                                                        

      EXCP count: 0                                                            

                                                                               

                                                                               

NOTE: CALL EXECUTE generated line                                              

69      +  FILENAME DYNAMDD "TST.MDL.ND00017.SPLIT"                        

69      +! 8,(1,1),RLSE)                            LRECL=5 RECFM=FB BLKSIZE=0 R

NOTE: : IGD100I 852C ALLOCATED TO DDNAME DYNAMDD  DATACLAS (        )          

70      +  FILE DYNAMDD;                                                       

70      +!                                                                     

70      +!  IF                                                                 

ERROR: Expected a statement keyword : found "FILE"                             

ERROR: Expected a statement keyword : found "SET"                              

ERROR: Expected a statement keyword : found "IF"                               

71      +   THEN DO;                                                           

71      +!                                 END;                                

ERROR: Expected a statement keyword : found "PUT"                              

ERROR: Expected a statement keyword : found "END"                              

72      +  FILENAME DYNAMDD "TST.MDL.ND00071.SPLIT"                        

72      +! 8,(1,1),RLSE)                            LRECL=5 RECFM=FB BLKSIZE=0 R

NOTE: : IGD100I 8129 ALLOCATED TO DDNAME DYNAMDD  DATACLAS (        )          

73      +  FILE DYNAMDD;                                                       

73      +!                                                                     

73      +!  IF                                                                 

ERROR: Expected a statement keyword : found "FILE"      


The same type of errors continue for all the distinct files allocated.                    

Super Contributor
Posts: 644

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to prasanna_sk

Aha.  I think you have a problem with trying to do too much in the single datastep. 

1.  Call symput () creates the macro variables on the fly.  What you are doing is overwriting the values each time.  At the end of the data step there is only one value for each of ID1 and IDQ.

2.  Call execute puts the enclosed statement into the queue to execute after the data step completes. Thus no filenames are created during the execution of the data _null_.  Using the filename () function as I suggested avoids this limitation.

3.  Similarly your call execute of %Write puts the statements into the queue but the datastep has ended and these commands are orphans, to the bafflement of SAS.

4.  Potentially you have a problem  with

     IF INID=&ID1 THEN DO;

(assuming you used resolve() instead of call execute to get %Write to work within the datastep)

At the least you should have

     IF INID="&ID1" THEN DO;   

but in any case the macro variable created by Call Symput is not available at the time the datastep is compiled so it will still throw an error.

Aren't you tempted to try my suggested solution?

Richard

Contributor
Posts: 23

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to RichardinOz

I see the problem. That's what happens when you try to learn something without understanding the basics, I guess.

The limitation I see in the suggested solution is - I cannot hard code the ID values in the program. Is there a way to work around that with your solution?

Let's say - a new ID comes in tomorrow - the program needs to change to accommodate that scenario - correct?

Super Contributor
Posts: 644

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to prasanna_sk

You have done most of the work already.

You have a list of ids.  The trick here is to generate the when statements for whatever ids turn up.

So we need a macro for that.  But first

Proc SQL noprint ;

     Select distinct ID into :idseq separated by ' '

          From work.idlist ;

     Select count (distinct ID) into :idnum

          From work.idlist ;

Quit ;

%Macro WHEN_STATEMENTS ;

     %Do i = 1 to &idnum ;

          %Let ID = %Scan (&idseq, &i) ;

          When ("&ID") File FILE&ID ;

     %end ;

%Mend ;

Create the filenames (can be done in a macro but a data _null- is easier to manage) ;

Data _Null_ ;

    Set idlist ; 

     fileref = cats("TST.MDL.ND", ID, ".SPLIT") ;

     filename = cats('FILE', ID) ;

     device = ' ' ;

     options = "DISP=(NEW,CATLG) UNIT=SYSDA SPACE=(27998,(1,1),RLSE)

                  LRECL=5 RECFM=FB BLKSIZE=0 RETPD=180";

     rc = filename (fileref, filename, device, options) ;

Run ;

Now you should be able to write to the split files:

DATA _NULL_;                                                         

    SET IDLIST END=EOF;

     BY ID;

     SELECT (ID) ;

          %WHEN_STATEMENTS ;

          OTHERWISE ;

     END ;

     PUT @1 ID $CHAR5.;

RUN ;

Of course none of this is tested so if you have any problems show me the log...

Richard

Contributor
Posts: 23

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to RichardinOz

Richard - Here is the error I get after fixing a couple of minor typos.

37                                                                                                               

38        DATA _NULL_;                                                                                           

39            SET IDLIST END=EOF;                                                                                

40             BY ID;                                                                                            

41             SELECT (ID) ;                                                                                     

42                  %WHEN_STATEMENTS ;                                                                           

NOTE: Line generated by the invoked macro "WHEN_STATEMENTS"                                                      

42      +  WHEN ("&ID") FILE FILE&ID ;                                                                           

                                 ^                                                                               

ERROR: Invalid logical file name                                                                                 

NOTE: Line generated by the invoked macro "WHEN_STATEMENTS"                                                      

42      +                                                     WHEN ("&ID") FILE FILE&ID ;                        

                                                                                    ^                            

ERROR: Invalid logical file name                                                                                 

NOTE: Line generated by the invoked macro "WHEN_STATEMENTS"                                                      

42      +                                                     WHEN ("&ID") FILE FILE&ID ;                        

                                                                                    ^                            

ERROR: Invalid logical file name                                                                                 

NOTE: Line generated by the invoked macro "WHEN_STATEMENTS"                                                      

42      +                                                     WHEN ("&ID") FILE FILE&ID ;                        

                                                                                    ^                            

ERROR: Invalid logical file name                                                                                 

NOTE: Line generated by the invoked macro "WHEN_STATEMENTS"                                                      

42      +                                                     WHEN ("&ID") FILE FILE&ID ;                        

                                                                                    ^                            

ERROR: Invalid logical file name                                              

ERROR: Found ";" when expecting end                                           

43                  OTHERWISE ;                                               

                    ^                                                         

ERROR: The statement "OTHERWISE" is unknown in this context                   

                                       

Super Contributor
Posts: 644

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to prasanna_sk

Hmm.  I don't have access to SAS at present so I can't debug the macro.  Use the rest of my code to assign the filenames (but if ID is 5 char the filename might be too long - use a shorter prefix, maybe just F) and go with the filevar solution.  I would be interested in a log dump for just one ID, but with the options statement at the top of your code.

     options SYMBOLGEN MLOGIC MPRINT ;

Richard

Contributor
Posts: 23

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to RichardinOz

Hi Tom/Richard

Thanks for your suggestions.

I used parts of codes from suggestion given by both of you and it seems to be working. I allocated files first and then using solution from Tom, sued FILEVAR option and split the records successfully.

Here is my problem - the solution takes in each and every record and writes the record into the file referenced for that record. This creates a lot of sas log (a pair of the below listed log for each record) and takes a lot of time.

NOTE: The file TST.MDL.ND00071.SPLIT is:                      

      Dsname=TST.MDL.ND00071.SPLIT,                           

      Unit=3390,                                                  

      Volume=STRW63, Disp=SHR, Blksize=27995, Lrecl=5, Recfm=FB,  

      Creation=2013/08/05                                         

                                                                 

NOTE: 4 records were written to file DUMMY                       

I have like 10-15 million records which I need ti split and I am not sure if this is an effective solution for a large number of records. Any suggestion/idea to cater large volume?

Super User
Super User
Posts: 7,038

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to prasanna_sk

Turn off the NOTES option.  If you want you can count the observations read yourself.

%let save=%sysfunc(getoption(notes));

options nonotes;

data _null_;

  if eof then call symputx('nobs',_n_-1);

  set .... end=eof ;

...

run;

options &save ;

%put NOTE: &nobs observations processed. ;

Trusted Advisor
Posts: 3,211

Re: Split Input file based on Identifier and dynamically allocate them

When sizing and performance will become an issue.

The PUT and INPUT statements with SAS do not support the handling of multiple flat files like C or Cobol. 

These input/output processing do support the change of the physical file, filename (filevar) while processing the data.

For buffering reasons it is not very smart to change the physical names continously going back to used ones.

When you can deliver the input in a sorted way and then switching the physical name at the needed moment it will perform/behave well.    

As shown z/OS does support dynamic files and the dynamic allocations.

It is the planning department (jobs scheduling, dasd) that does not like that, as you can disturb their way of doing work. 

---->-- ja karman --<-----
Contributor
Posts: 23

Re: Split Input file based on Identifier and dynamically allocate them

My input is like 11mil records.

Richard - In Spite of reading the input data once - would it not be an overhead for the system to switch file names and then while writing each of the records it has to interact with Macro facility internally - 11 mil times??

In the above solution - I will be reading the input data, say 10 times (for 10 unique ID's) and changing the output files the same 10 times instead of 11mil times. So - essentially I would be referring to the already dynamically allocated file only once. Say - ID is 00322, I would be gathering all the records (say 999 records) that with ID as 00322 and then write them once to the output file instead of referring them 999 times.

Wouldn't this be more effective? I am still learning SAS on the fly - so am just trying to put forth my point with my very limited knowledge. Please do correct me when I am wrong.

Thank you.

Trusted Advisor
Posts: 3,211

Re: Split Input file based on Identifier and dynamically allocate them

Posted in reply to prasanna_sk

prassanna,

Check your input and output requirements.

I am seeing you input is 3390-type.  These are not HFS (Bpxas service Unix like)  but old IBM style approach

Probably logical volumes are limited by 9Gb sizing. A common sizing as limited by the IO-channel concept inherited of the 360 design (1960's).

I am seeing your test output us using SYSDA a temporary pool based on 3390's but in you next output it is single volume tape (6250 type) based). The dataset is appproximity 10Gb (10M *1k) of size. Not a big problem to process unless data has commonly been spread over many volumes leaving no space for you.  You could use multivolume datasets to overcome that volume size limit.

Your saswork should be able to hold twice this size or more. Same attention points with volumes unless hfs defined with sufficient space.

This sizing won't probalby fit on a single tape. If you requirement is making a copy af the datastep to tape you could use a multivolume tapedataset. Possible some Dataclass types as set up by storage-management could help you with that. 

Tape could be well simulated by dasd storage (VTS). It is still simulating the old tape-drive.

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 20 replies
  • 1126 views
  • 6 likes
  • 4 in conversation