BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
daradanye
Obsidian | Level 7

Hi guys,

 

I have the following question,

 

I have datasets from a2007 to a2018 (12 years).  For each year, the data looks like this:

FirmPlant
a1
a2
a3
b1
b2
c1
c2
c3
c4

I want to summarize how many unique firms/firm-plants combinations for each year and then aggregate them together.  What I would like is something like this:

Year# of firms# of plants
200739
  
20184

12

 

I know I can use proc means output to generate a dataset each year.  Are there any more efficient ways?  By the way, since the dataset of each year is quite large,  I tried to append all the years together,  But it is very slow and drags down my PC speed.

 

I will appreciate it very much if someone can help out here.  Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
r_behata
Barite | Level 11
data combine;
	set a2007- a2018 indsname=source;
	year=compress(source,,'kd');
	no_firms=1;
	no_plants=1;
run;

proc means data= combine noprint nway;
	var  no_firms no_plants;
	class year;
	output out=want(drop = _:) sum=;
run;

View solution in original post

7 REPLIES 7
Reeza
Super User
Create a view that appends all the data sets together and run a single proc means on that data set.
daradanye
Obsidian | Level 7

Hi Reeza,

 

Thank you.  Is there any alternative way instead of appending datasets first?  The dataset is too large.  It takes a lot of to append and run in aggregate. Thanks!

Reeza
Super User

That's why I said to create it as a VIEW, not a data set. 

Did you try a VIEW? Then only proc means will be processing the whole data set. 

 

data combined / view=combined;
set a2012-a2017 ;
run;

proc means data=combined;
class year;
var ....

run;
andreas_lds
Jade | Level 19

@daradanye wrote:

Hi Reeza,

 

Thank you.  Is there any alternative way instead of appending datasets first?  The dataset is too large.  It takes a lot of to append and run in aggregate. Thanks!


To large? How many obs do you have per dataset? Can you post the proc means you are using? Executing proc means for each dataset and appending the results could solve the performance issue.

r_behata
Barite | Level 11
data combine;
	set a2007- a2018 indsname=source;
	year=compress(source,,'kd');
	no_firms=1;
	no_plants=1;
run;

proc means data= combine noprint nway;
	var  no_firms no_plants;
	class year;
	output out=want(drop = _:) sum=;
run;
KachiM
Rhodochrosite | Level 12

@daradanye 

 

Have you gotten the answer you want? If not, describe the issues you face.

KachiM
Rhodochrosite | Level 12

@daradanye 

 

The solution given by @r_behata gives same number for no_firms and no_plants as I have tested. The distinct number of firms within a Data Set must be less than or equal to the number of observations. Have I missed anything here?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 594 views
  • 0 likes
  • 5 in conversation