BookmarkSubscribeRSS Feed
lchristensen
Obsidian | Level 7

After 35+ years of using data step programming I am trying to learn Proc SQL. I have something I've written and works but believe it can be streamlined.

 

The code below reads in a CSV file containing a list of tape volumes in a virtual tape library and the status of the volumes (ie: 'Cataloged' or 'Scratch'). The end goal is to merge that with a text file coming from the mainframe listing dataset names and tape volume and building control cards to delete the z/OS catalog entry for any tape volume in that mainframe listing that the virtual tape library says are already in 'scratch' status. I also had an interest in knowing how well compressed the data is on the virtual tape system.

 

I have not included the data step processing that follows this for  getting that mainframe list and the merge. Again, it all works. The entire program is also attached.

 

Can you suggest how to streamline the four PROC SQL steps? 

 

This is SAS V9.4.

 

filename 	CATLIST 	"z:\mainframe\GPSE\Secure Agent\Reports\LISTCAT_HLQ_GPSEBKP";	
filename	OUTTAPES	"z:\Mainframe\GPSE\Secure Agent\Reports\LISTCAT_TAPES.txt";	
filename	BLDNSCR		"z:\Mainframe\GPSE\Secure Agent\IDCAMS_NSCR.txt";



/*	----------------------------------------------------------------------------	*/
/*	Part-1: Get the tape volumes and status from a SecureAgent generated report.	*/
/*		The SecureAgent generated report is read in by the PROC IMPORT.				*/
/*	----------------------------------------------------------------------------	*/
/*	Import the All Volumes report. Note it is a CSV with column headings on line-1	*/
/*	----------------------------------------------------------------------------	*/
Proc Import
	datafile 	=	"z:\mainframe\GPSE\Secure Agent\Reports\All Volumes_20230203.csv"
	out			=	Work.DSPL_CSV
	dbms		=	csv
	replace;
	getnames	=	yes;
run;
/*	----------------------------------------------------------------------------	*/
/*	Read all fields. Sum the CompressedSize and UncompressedSize fields				*/
/*		Also determines the compression ratio										*/
/*	----------------------------------------------------------------------------	*/

Proc SQL;
	create 	table	Work.DSPL_TBL	as
			select *
			from	Work.DSPL_CSV;
quit;
/*	----------------------------------------------------------------------------	*/
/*	Determine the mean(Ratio) which is the overall compression ratio.				*/
/*	----------------------------------------------------------------------------	*/
Proc SQL;
	title  'Sum of compression ratio'; 
	create table Work.Compressed_Ratio as
	select  
		sum(CompressedSize) 	as	All_CompressedSize,
		sum(UncompressedSize)	as	All_UncompressedSize,
		TapeStatus
	from	Work.DSPL_TBL
		where CompressedSize > 0
		group by TapeStatus
		order by TapeStatus;
quit;
proc SQL;
	title 'Mean of Compressed and Uncompressed Sizes and Compression Ratios';
	create table	Work.Ratio as
	select All_CompressedSize,	All_UncompressedSize, TapeStatus,
		(All_UncompressedSize / All_CompressedSize) 	as Ratio,
		mean(calculated Ratio) as Overall_Ratio
		from	Work.Compressed_Ratio 
			group by TapeStatus
			order by TapeStatus
	;
Quit;
proc SQL;
	title 'Mean of Compressed and Uncompressed Sizes and Compression Ratios';
	create table	Work.Overall_Ratio as
	select All_CompressedSize,	All_UncompressedSize, TapeStatus, Overall_Ratio,
		mean (Ratio) as Total_Ratio
		from	Work.Ratio	
		;
Quit;

 

Note that the above code works but I suspect it can all be done in one or two procs rather than the four I have here.

4 REPLIES 4
maguiremq
SAS Super FREQ

I'm unfamiliar what your end result looks like, and I kind of whipped this one up quickly - I'm not sure if it will throw errors on some of this. Please let me know if this gets you closer - I think using some subqueries would generally help in this situation.

proc sql;
		create table single_query as
			select distinct
				a.All_CompressedSize
				, a.All_UncompressedSize
				, a.TapeStatus
				, a.Overall_Ratio
			from
				(
					select
						sum(CompressedSize) as All_CompressedSize
						, sum(UncompressedSize) as All_UncompressedSize
						, (calculated All_UncompressedSize / calculated All_CompressedSize) as Ratio
						, TapeStatus
						, mean(calculated Ratio) as Overall_Ratio
					from
						work.DSPL_TBL
					where 
						CompressedSize > 0
					group by
						TapeStatus
				) a
			order by
				a.TapeStatus
				;
quit;
lchristensen
Obsidian | Level 7
create table single_query as
		select distinct
			a.ALL_CompressedSize		,
			a.All_UncompressedSize	,
			a.tapeStatus				,
			a.Overall_Ratio
		from
			(
				select
					sum(CompressedSize)		as 	All_CompressedSize										,
					sum(UncompressedSize)	as	All_UncompressedSize									,
					(Calculated All_UncompressedSize / Calculated All_CompressedSize)	as Ratio		,
					TapeStatus																			,
					mean(calculated Ratio)	as 	Overall_Ratio
				from
						Work.DSPL_CSV
				where
					CompressedSize 	>	0
				group by
					TapeStatus
			)	a
			order by
			TapeStatus
	;
quit;

Results in:

ERROR: Summary functions nested in this way are not supported.
ERROR: Summary functions nested in this way are not supported.

 

In my original code I wind up with   Work.Overall_Ratio   which had two records containing five fields: All_CompressedSize, All_UncompressedSize, TapeStatus, Overall_Ratio, and Total_Ratio. 

 

The two records are summary records based on the value of TapeStatus.

 

I've tried something else:

Proc SQL;
  create  table   Work.DSPL_TBL1 as
              select *,
                sum(CompressedSize)        as  All_CompressedSize,
                sum(UncompressedSize)    as  All_UncompressedSize,
                mean(calculated All_CompressedSize,
                mean(calculated All_UncompressedSize)
            from   Work.DSPL_CSV
                 where CompressedSize > 0
                 group by TapeStatus
                 order by TapeStatus;
quit;

When this runs I get the same error from above:
ERROR: Summary functions nested in this way are not supported.
ERROR: Summary functions nested in this way are not supported.


I've seen that error in many tickets out here but I'm not understanding how to do the nesting.

 

 

SASKiwi
PROC Star

If you want to combine detail and summarised data then do it as a join on a sub-query:

Proc SQL;
  create  table Work.DSPL_TBL1 as
  select *
  from   Work.DSPL_CSV as A
  left join
  (select TapeStatus,
   sum(CompressedSize) as All_CompressedSize,
   sum(UncompressedSize) as All_UncompressedSize,
   mean(calculated All_CompressedSize) as as  All_CompressedSize_Mean,
   mean(calculated All_UncompressedSize) as All_UncompressedSize_Mean
   from Work.DSPL_CSV
   where CompressedSize > 0
   group by TapeStatus
   ) as B
   on A.TapeStatus = B.TapeStatus
   order by A.TapeStatus;
quit;
ballardw
Super User

Means of ratios when each is calculated such as

All_UncompressedSize / All_CompressedSize)

are unreliable and often just plain wrong unless the denominators are the same, or in the case of large numbers of values at least pretty similar. See if the output of your shown code makes sense when this is the input data set.

data 	Work.DSPL_TBL;
  input tapestatus Compressedsize uncompressedsize;
datalines;
1 999 1000
1 999 1000
1 999 1000
1 999 1000
1 999 1000
1 999 1000
1 999 1000
1 999 1000
1 999 1000
2 1   2
;

My gut feeling is that the TOTAL_Ratio that results is way low.

 

What you might look at:

Proc summary data= 	Work.DSPL_TBL;
   class tapestatus;
   var Compressedsize uncompressedsize;
   output out=demo sum= ;
run;

This sums your variables, leaving the same variable name. There will be a variable _type_ that has values of 0 and 1. A short lokk will show that Type=0 has a summary of ALL the records and the _type_=1 has just each tape status.

Then calculate the ratios. This will show the components used for the ratio.

data ratios;
   set demo;
   ratio = uncompressedsize /  Compressedsize;
run;

Note: You would be better off writing a data step instead of the Proc Import. A minor change to a file can result in your imported data having issues such as variable name changes, variable types, lengths of character variables and with real  headache generating changes things like date variables become datetime (or vice versa) without notice until something later fails.

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 488 views
  • 0 likes
  • 4 in conversation