DATA Step, Macro, Functions and more

Summing with infile statement

Reply
Frequent Contributor
Posts: 124

Summing with infile statement

Greetings all.  I am needing to access a mainframe dataset, and I am running into issues with memory allocated to my user_id on the host.  I am thinking this is because I am trying to grab the whole file to download to my local machine, but the file is almost 3000 characters wide, and there are tens of millions of obs.  I would like to just sum in one step.  Below is the code I am trying (and I know it works on other datasets, because I use it on other smaller datasets, and it works fine). 

rsubmit;

    filename gl "PPPCTK.ZZZ.BDRXXXXX.MNTHEND.DSKO745(0)" DISP=SHR;

        data glrecs ;

            infile gl obs=100 ;

             input

                @16        DT_DB                $10.

                @48        AT_BILLED            PD6.2   /* need to sum this and group by dt_db and state */

                @2650    STATE                $2. ;   

        run ; quit ;

        proc download data=glrecs out=work.glrecs;

        proc datasets library=work;

endrsubmit;

Now, if I were doing this with SQL, I would want the remote query to be...

select state, dt_db, sub(at_billed) as at_billed

  from dataset

group by state, dt_db

Then I would only be creating a download file that is a few hundred obs instead of many millions.  I sure would appreciate any help or advice.  Thank you.

Greg

Respected Advisor
Posts: 3,777

Re: Summing with infile statement

Why don you add the SQL or other PROC etc to the RSUBMIT and PROC DOWNLOAD the summary.

Frequent Contributor
Posts: 124

Re: Summing with infile statement

I'm not sure how I would reference the fields with SQL, since they are not defined other than my knowing the starting positions.  I also don't know how I would reference the dataset in the FROM clause, or if it is even possible to run proc sql on a raw file.

Respected Advisor
Posts: 3,777

Re: Summing with infile statement

After the data step GLRECS executes.  You will have a SAS data WORK.GLRECS on the server, with three variable DT_BD AT_BILLED and STATE.  Instead of downloading the data run PROC SQL an the server as part of the RSUBMIT.

       data glrecs ;

            infile gl obs=100 ;

             input

                @16        DT_DB                $10.

                @48        AT_BILLED            PD6.2   /* need to sum this and group by dt_db and state */

                @2650    STATE                $2. ;  

        run ; quit ;

Frequent Contributor
Posts: 124

Re: Summing with infile statement

Ah, yes, and therein lies the problem.  I probably could have explained better at the beginning.  The space available on the mainframe to each end user is limited.  In most cases we have sufficient resources on the host to create the datasets we need to download.  However, in this case I cannot even create the dataset because it's size exceeds the memory available to my user_id.  That is why I'm looking for a way to maybe sum from the buffer and just write the summed values to the dataset before downloading.  I guess I could try using retain.

Super User
Super User
Posts: 6,498

Re: Summing with infile statement

Frequent Contributor
Posts: 124

Re: Summing with infile statement

Wow, Tom, that is going to take me a while to digest it all.  In the mean time, I have managed to have my resources temporarily raised, so I am able to create the dataset.  Now my problem is that, for some reason, proc sql does not run on the mainframe.  I think I can use proc means to do what I want, but there are so many options and examples out there that I can't pin down what I need.  If my dataset has only 2 variables, i'll call them date_column and number_column, is there another way to do what the below sql does?

proc sql ;

     create table sum_table as

     select date_column, sum(number_column) as sum

     from my_table

     group by date_column ;

quit ;

All I need is a dataset that sums the number_column by the date_column.  I don't need any other stats.

Respected Advisor
Posts: 3,777

Re: Summing with infile statement

Another possibility would be a VIEW and PROC SUMMARY.  It may not work if SAS needs to make big swap files, but it might be work a try.

rsubmit;
filename gl "PPPCTK.ZZZ.BDRXXXXX.MNTHEND.DSKO745(0)" DISP=SHR;
data glrecsV / view=glrecsV;
   infile gl obs=100 ;
    input
       @
16        DT_DB                $10.
       @
48        AT_BILLED            PD6.2   /* need to sum this and group by dt_db and state */
       @
2650    STATE                $2. ;  
run; quit;
proc summary data=glrecsV nway;
  
class state dt_db;
   output out=glrecs(drop=_Smiley Happy sum(at_billed);
   run;
proc download data=glrecs out=work.glrecs;
   run;

endrsubmit;
Frequent Contributor
Posts: 124

Re: Summing with infile statement

data_null_, thank you for that.  I did not know about proc summary.  That did just what I needed to do.  One thing, I was getting an "expecting an =." error, and found that I had to provide an alias for the sum variable.  I changed this...

output out=glrecs(drop=_Smiley Happy sum(at_billed);


to


output out=glrecs(drop=_Smiley Happy sum(at_billed) = at_billed;


Thank you


Greg

Respected Advisor
Posts: 3,777

Re: Summing with infile statement

Sorry yes that was untested.  You can also use sum(at_billed)= to use the same name.

Ask a Question
Discussion stats
  • 9 replies
  • 262 views
  • 6 likes
  • 3 in conversation