Hi guys,
I have the following question,
I have datasets from a2007 to a2018 (12 years). For each year, the data looks like this:
Firm | Plant |
a | 1 |
a | 2 |
a | 3 |
b | 1 |
b | 2 |
c | 1 |
c | 2 |
c | 3 |
c | 4 |
I want to summarize how many unique firms/firm-plants combinations for each year and then aggregate them together. What I would like is something like this:
Year | # of firms | # of plants |
2007 | 3 | 9 |
… | ||
2018 | 4 | 12 |
I know I can use proc means output to generate a dataset each year. Are there any more efficient ways? By the way, since the dataset of each year is quite large, I tried to append all the years together, But it is very slow and drags down my PC speed.
I will appreciate it very much if someone can help out here. Thanks!
data combine;
set a2007- a2018 indsname=source;
year=compress(source,,'kd');
no_firms=1;
no_plants=1;
run;
proc means data= combine noprint nway;
var no_firms no_plants;
class year;
output out=want(drop = _:) sum=;
run;
Hi Reeza,
Thank you. Is there any alternative way instead of appending datasets first? The dataset is too large. It takes a lot of to append and run in aggregate. Thanks!
That's why I said to create it as a VIEW, not a data set.
Did you try a VIEW? Then only proc means will be processing the whole data set.
data combined / view=combined;
set a2012-a2017 ;
run;
proc means data=combined;
class year;
var ....
run;
@daradanye wrote:
Hi Reeza,
Thank you. Is there any alternative way instead of appending datasets first? The dataset is too large. It takes a lot of to append and run in aggregate. Thanks!
To large? How many obs do you have per dataset? Can you post the proc means you are using? Executing proc means for each dataset and appending the results could solve the performance issue.
data combine;
set a2007- a2018 indsname=source;
year=compress(source,,'kd');
no_firms=1;
no_plants=1;
run;
proc means data= combine noprint nway;
var no_firms no_plants;
class year;
output out=want(drop = _:) sum=;
run;
The solution given by @r_behata gives same number for no_firms and no_plants as I have tested. The distinct number of firms within a Data Set must be less than or equal to the number of observations. Have I missed anything here?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.