Solved: Re: Parsing a PDS and merging with a SAS dataset

serge68 · Posted 03-09-2020 09:50 AM

I'm in a mainframe environment and looking to read members of a PDS looking for certain values. The result from that are then merged with another sas dataset to create a csv file. For the most part I have something that works.

My issue seems to be around when my PDS member contains more than one of that values I'm looking for, AND this PDS member has more than one entry in my sas dataset to be merged with.

I'm not sure how to give a sample of my PDS within the SAS code that you could use as an example. I'll try to illustrate here what those PDS members look like –

Mem01

Command, value1

Mem02

Command,value1

Command,value2

Mem03

Command,value1

This is the SAS dataset which the above gets merged with –

job01 mem01

job02 mem02

job03 mem02

job04 mem03

So in this case I expect my output to look like –

Mem01,job01,command,value1

Mem02,job02,command,value1

Mem02,job02,command,value2

Mem02,job03,command,value1

Mem02,job03,command,value2

Mem03,job04,command,value1

However what I'm seeing is this –

Mem01,job01,command,value1

Mem02,job02,command,value1

Mem02,job02,command,value2

Mem02,job03,command,value2

Mem02,job03,command,value2

Mem03,job04,command,value1

For members with multiple commands, that are associated to multiple jobs, it seems the value retains the setting of the last occurrence from the first job. In this case, the first occurrence of mem02,job03,command,value2 should reflect value1, not value2.

This is what my code looks like:

data have;                                 
   infile cards missover;                  
   input    job      $ 02-09               
            member   $ 11-18               
   ;                                       
   cards;                                  
job01    mem01                             
job02    mem02                             
job02    mem02                             
job03    mem02                             
job03    mem02                             
job04    mem03                             
;                                          
run;                                       

proc sort data=have   
 OUT=have_NoDups      
 nodupkey  ;          
 by member ;          
run ;                 

data file2;                                                   
  set have_NoDups;                                            
  infile 'aaaa.bbbbbb.cccc.pds' memvar=member end=done;  
  do until (done);                                            
    input msg     $ 01-72 ;                                   
    if msg =: 'COMMAND' or msg =: 'FRED'  then do;
           parm1   = scan(msg,1,",") ;                        
           parm2   = scan(msg,2,",") ;                        
           memnam  = member ;                                 
           output;                                            
    drop msg;             
    end;       
  end;         
run;                                               

/*********************************************************************/
/* Merge the jobs file with the commands file                        */
/*********************************************************************/
DATA file3;                                                             
 Merge have file2 (rename=(memnam=member));;                            
 by member;                                                             
                                                                       
proc print;                                                            
 run;

Any thoughts as to what I'm missing here?

Hope that makes sense.

Tom · Posted 03-09-2020 02:44 PM

I not sure I am following this discussion, but it sounds to me like you have some metadata that lists which members (files) you want to read. Do you just have one PDS (folder) or are their multiple? Either way if the goal is to check only the files that need to be check you should first get the list of files. So if you start with something like this that maps members to jobs.

data job_members ;
  input job :$10.  member :$32. ;
cards;
job01    mem01
job02    mem02
job02    mem02
job03    mem02
job03    mem02
job04    mem03
;

You first want to reduce it to just a list of the members.

proc sort data=job_members(keep=member) out=members nodupkey;
  by member;
run;

Then you can use that to drive the process of reading the text files.

Do you know the name of the PDS? Let's assume it is named PROD.PAYROLL.JCL like in this example:

https://documentation.sas.com/?docsetId=hosto390&docsetTarget=n00nf3im9mpn0pn1u1g3wyj51vk6.htm&docse...

So then your step to read the text files is something like:

data member_commands ;
  set members ;
  length dsn $80 ;
  dsn=cats('PROD.PAYROLL.JCL(',member,')');
  infile text filevar=dsn end=eof;
  do while (not eof);
    ... here is you code to read the lines and figure out what you want ..
    .. make sure to include an OUTPUT statement when you find something interesiting ...
  end;
run;

Now you can re-combine the two datasets based on the value of MEMBER.

proc sql;
create table job_member_commands as
select * 
from job_members A
inner join
  member_commands B
on a.member = b.member 
order by a.job,a.member
;
quit;

View solution in original post

Patrick · Posted 03-09-2020 10:43 AM

From a SAS processing perspective your PDS members are just like text files? And you input the text in PDS members and then get a SAS data set per member with two columns in the form {key,value}. Is that correct?

If you've solved reading the PDS members already then can you please post sample data how the resulting SAS table(s) look like? The ones which you then want to merge with your table have if I've got that right.

serge68 · Posted 03-09-2020 11:12 AM

This is the output from a proc print for thw HAve &File2 datasets immediatley before the merge:

This is what the HAVE dataset looks like:

Obs job member

1 job01 mem01
2 job02 mem02
3 job02 mem02
4 job03 mem02
5 job03 mem02
6 job04 mem03

This is what the FILE2 dataset looks like:

Obs job command parm2

1 job01 COMMAND PARM1
2 job02 COMMAND PARM1
3 job02 COMMAND PARM2
4 job04 COMMAND PARM1

And this is what the output looks like after merging HAVE with FILE2.

Obs job member command parm2

1 job01 mem01 COMMAND PARM1
2 job02 mem02 COMMAND PARM1
3 job02 mem02 COMMAND PARM2
4 job03 mem02 COMMAND PARM2
5 job03 mem02 COMMAND PARM2
6 job04 mem03 COMMAND PARM1

Patrick · Posted 03-09-2020 11:22 AM

Obs job member
1 job01 mem01
2 job02 mem02
3 job02 mem02
4 job03 mem02
5 job03 mem02

Your first dataset looks like a directory listing (members in PDS). If so then why is #3 a duplicate of #2.?

Not sure how you create this table but may-be a simple DISTINCT (or proc sort with nodupkey) will resolve the problem.

serge68 · Posted 03-09-2020 11:37 AM

The first dataset(have) is built based on a cross reference file. Its indicating that those members are being called by those jobs. In this case its saying mem02 is being called twice by job02, and twice by jobs03. That is valid.

That’s why I had to include the proc sort nodupkey, to get rid of those duplicates. As without that the data file2 step stopped processing when it reached them. So I read the have_NoDups file into data file2 to allow it to process, but then I need to merge back with the original have file to ensure I don't drop records.

Patrick · Posted 03-09-2020 11:48 AM

There isn't any parameter value for job03 in your sample data so not sure why it appears in your result set.

Have a look at below. Is that returning what you're after?

data job_member;
  input (job member) ($);
  datalines;
job01 mem01
job02 mem02
job02 mem02
job03 mem02
job03 mem02
job04 mem03
;
data file2;
  input (job command parm) ($);
  datalines;
job01 COMMAND PARM1
job02 COMMAND PARM1
job02 COMMAND PARM2
job04 COMMAND PARM1
;

proc sql;
/*  create table want as*/
    select distinct
      f2.job, 
      jm.member,
      f2.command,
      f2.parm
    from 
      job_member as jm
      inner join
      file2 as f2
    on jm.job=f2.job
    ;
quit;

serge68 · Posted 03-09-2020 01:45 PM

Job03 isn't appearing in my proc print of File2 as its getting dropped by the proc sort nodupkey. The key is on member and as mem02 is used in both Job02 and job03, its dropping job03 from this.

If I add Job03 to your file2 definition, it does work –

data file2;

input (job command parm) ($);

datalines;

job01 COMMAND PARM1

job02 COMMAND PARM1

job02 COMMAND PARM2

job03 COMMAND PARM1

job03 COMMAND PARM2

job04 COMMAND PARM1

;

Produces this –

job member command parm

-------------------------------------

job01 mem01 COMMAND PARM1

job02 mem02 COMMAND PARM1

job02 mem02 COMMAND PARM2

job03 mem02 COMMAND PARM1

job03 mem02 COMMAND PARM2

job04 mem03 COMMAND PARM1

My issue then is how do I produce my file2? I need to read the contents of each member to obtain the command and parm. Currently I'm passing the member name from the have dataset, though that requires me to use the proc sort nodupkey, which causes the issue with job03 being dropped.

By passing the member name from the have dataset into here, I'm trying to avoid reading the whole PDS, as it contains thousands of members I don't require. Though I may have to consider doing that now.

Tom · Posted 03-09-2020 02:44 PM

I not sure I am following this discussion, but it sounds to me like you have some metadata that lists which members (files) you want to read. Do you just have one PDS (folder) or are their multiple? Either way if the goal is to check only the files that need to be check you should first get the list of files. So if you start with something like this that maps members to jobs.

data job_members ;
  input job :$10.  member :$32. ;
cards;
job01    mem01
job02    mem02
job02    mem02
job03    mem02
job03    mem02
job04    mem03
;

You first want to reduce it to just a list of the members.

proc sort data=job_members(keep=member) out=members nodupkey;
  by member;
run;

Then you can use that to drive the process of reading the text files.

Do you know the name of the PDS? Let's assume it is named PROD.PAYROLL.JCL like in this example:

https://documentation.sas.com/?docsetId=hosto390&docsetTarget=n00nf3im9mpn0pn1u1g3wyj51vk6.htm&docse...

So then your step to read the text files is something like:

data member_commands ;
  set members ;
  length dsn $80 ;
  dsn=cats('PROD.PAYROLL.JCL(',member,')');
  infile text filevar=dsn end=eof;
  do while (not eof);
    ... here is you code to read the lines and figure out what you want ..
    .. make sure to include an OUTPUT statement when you find something interesiting ...
  end;
run;

Now you can re-combine the two datasets based on the value of MEMBER.

proc sql;
create table job_member_commands as
select * 
from job_members A
inner join
  member_commands B
on a.member = b.member 
order by a.job,a.member
;
quit;

serge68 · Posted 03-09-2020 03:49 PM

Hi Tom – I think what you have outlined is what I'm currently doing. There is just one PDS file to be read. The members to be read from that file are identified in the first step creating the have dataset. However by executing the proc sort nodupkey, its dropping some data. For example Job03 from your data job_members would get dropped here.

As a result Job03 would be missing when the proc sql is executed.

At this point I'm trying to understand if there is a method that would allow me to combine the data here without having to read every member from my PDS(as there are thousands of members there I'm not interested in).

Tom · Posted 03-09-2020 04:39 PM

Please read my previous answer more closely.

If you don't want PROC SORT NODUPKEY to eliminate data then make sure to specify a different output dataset than the input dataset.

serge68 · Posted 03-10-2020 10:18 AM

Thanks Tom. I did have separate input & output datasets within my proc sort nodupkey. I've added the (keep=member) as you had, and dropped the distinct from my proc sql to match yours.

Its looking much better now.

Appreciate all the assistance.

Classroom Training Available!