BookmarkSubscribeRSS Feed
pamplemouse22
Calcite | Level 5

Hello all, 

 

I have the following proc sql code (see below) that does what I want it to do on a sample of data. However, my computer runs out of resources to run the program on the full dataset. I have an empty 4 TB hardrive and am hoping it is possible to run with some minor tweaks to the program or SAS settings. Any suggestions would be greatly appreciated! Thank you!

 

Details about code/data:

Dataset A is 110GB 

Dataset B is 500MB

Once this procedure is completed, I would select one observation per group and the final dataset will go back down to ~110 GB. 

 

Steps I have taken: 

-specified procedure to create compressed dataset

-changed work directory to external harddrive

 

Other ideas I have: 

-run in batches and append

-somehow delete temp files as they are created/no longer needed? 

-run 1 mean at a time and merge

 

 

proc sql;
	create table work.birth_ap01 (compress=YES) as
	select A.*, B.*,
		   mean(B.meantemp) as mean_temp_wk,
		   mean(B.maxco) as mean_maxco_wk,
		   mean(B.maxeighthro3) as mean_maxeighthro3_wk,
		   mean(B.maxno2) as mean_maxno2_wk,
		   mean(B.maxo3) as mean_maxo3_wk,
		   mean(B.meanrh) as mean_meanrh_wk,
		   mean(B.meanpm25) as mean_meanpm25_wk
	from work.birth03 A, work.ap06_imputed B
	where A.zipnum=B.zipnum and (A.gewk_count=. or A.gewk_date_end ge B.date ge A.gewk_date_start)
	group by A.id, A.gewk_date_end, A.gewk_date_start
	order by A.id, A.gewk_count, B.date
	;
quit;
3 REPLIES 3
Patrick
Opal | Level 21

@pamplemouse22

You don't tell us where you run out of resources. Given the code you've posted I assume it's disk space.

Make sure not only WORK but also UTILLOC has sufficient disk space available. UTILLOC is used for temporary sort files and that's where your process will require the most disk space.

 

Taking your join condition into account: What's the relationship between your two tables? Could you end-up with many-to-many relationships? If so then this could explain why you run out of resources; ...and it would also require you to re-think if you can get correct averages this way.

What most likely really hurts you is SELECT A.*, B.* as this will likely lead to a result table with all the combined rows from tables A and B - and should there be many-to-many relationships then you can end up with a really huge table. Restrict your result table to the columns you really need; especially the ones from table A. 

 

 

You could consider using an index over ZIP code on Dataset A - but that's a secondary measure. First reduce the number of variables for your result set and ensure that your join condition doesn't allow for many-to-many cases.

SASKiwi
PROC Star

One thing that looks weird in your query is you are doing summary stats (MEAN) on your B table, but your GROUP BY columns are on your A table. Is that what you meant to do? You should really keep your summary stats query separate otherwise SAS will end up remerging the data which will increase the size a lot.

ChrisNZ
Tourmaline | Level 20

You don't tell us how many rows you have, but for each row of the large table you mach many rows of the smaller table repeatedly.

 

A.gewk_count=. or A.gewk_date_end ge B.date ge A.gewk_date_start

How many rows do you expect from table B for each row from table A for the date condition?

Also if gewk_count is missing you match *all rows* of table B for the given ID. That's potentially a performance killer if it's not very rare.

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 820 views
  • 0 likes
  • 4 in conversation