Hello!
I have included sample data but my actual data has more than 200,000 observations and the year variable spans about 30 years and ~60 counties.
I've tried proc report with this to summarize but the coding for 60 counties is cumbersome and SAS actually doesn't like the large number of observations.
Is there a more a more efficient way to do what I'm trying to do? Thank you!
Sample Data:
data new; infile DATALINES dsd missover; input county $ facility $ total var1 year $; CARDS; countya, busybeez, 100, 99, 20012002 countya, childtime, 90, 90, 20012002 countyb, kidzplace, 3, 2 , 20012002 countyb, littlesprouts, 2, 2, 20012002 countya, busybeez, 98, 98, 20022003 countya, childtime, 99, 99, 20022003 countyb, kidzplace, 5, 4 , 20022003 countyb, littlesprouts, 2, 2, 20022003 ; run;
Since my actual output will have 30 columns across (as opposed to the 2 across below, one column for each year), and
will have about 60 rows of counties, what is the most efficient way to get the desired output (below)?
I've tried the following but my column headers aren't labeled year so I'm not sure I'm getting what I'm expecting.
proc summary data=new order=data nway; class county year; var total var1; output out=news (drop=_type_ _freq_) sum=total var1; run; data percent (keep=county year pct_var1); set news; pct_var1=(var1/total)*100; run; proc sort data=percent out=percents; by county year; run; proc transpose data=percents out=var1byyear; var pct_var1 ; by county; run;
Desired Output into an Excel Spreadsheet
Var1 by county by year |
Year | ||
---|---|---|
20012002 | 20022003 | |
County | Pct Var1 | Pct Var1 |
countya | 99.5 | 100.0 |
countyb | 80.0 | 85.7 |
@jcis7 wrote:
Hello!
I have included sample data but my actual data has more than 200,000 observations and the year variable spans about 30 years and ~60 counties.
I've tried proc report with this to summarize but the coding for 60 counties is cumbersome and SAS actually doesn't like the large number of observations.
Is there a more a more efficient way to do what I'm trying to do? Thank you!
Sample Data:
data new; infile DATALINES dsd missover; input county $ facility $ total var1 year $; CARDS; countya, busybeez, 100, 99, 20012002 countya, childtime, 90, 90, 20012002 countyb, kidzplace, 3, 2 , 20012002 countyb, littlesprouts, 2, 2, 20012002 countya, busybeez, 98, 98, 20022003 countya, childtime, 99, 99, 20022003 countyb, kidzplace, 5, 4 , 20022003 countyb, littlesprouts, 2, 2, 20022003 ; run;
Since my actual output will have 30 columns across (as opposed to the 2 across below, one column for each year), and
will have about 60 rows of counties, what is the most efficient way to get the desired output (below)?
I've tried the following but my column headers aren't labeled year so I'm not sure I'm getting what I'm expecting.
proc summary data=new order=data nway; class county year; var total var1; output out=news (drop=_type_ _freq_) sum=total var1; run; data percent (keep=county year pct_var1); set news; pct_var1=(var1/total)*100; run; proc sort data=percent out=percents; by county year; run; proc transpose data=percents out=var1byyear; var pct_var1 ; by county; run;
Desired Output into an Excel Spreadsheet
Var1 by county by year
Year 20012002 20022003 County Pct Var1 Pct Var1 countya 99.5 100.0 countyb 80.0 85.7
Try
proc tabulate data=percent; class county year; var pct_var1; table county, year*pct_var1*max=''*f=f5.1 ; run;
Use ODS destination to write to Excel.
@jcis7 wrote:
Hello!
I have included sample data but my actual data has more than 200,000 observations and the year variable spans about 30 years and ~60 counties.
I've tried proc report with this to summarize but the coding for 60 counties is cumbersome and SAS actually doesn't like the large number of observations.
Is there a more a more efficient way to do what I'm trying to do? Thank you!
Sample Data:
data new; infile DATALINES dsd missover; input county $ facility $ total var1 year $; CARDS; countya, busybeez, 100, 99, 20012002 countya, childtime, 90, 90, 20012002 countyb, kidzplace, 3, 2 , 20012002 countyb, littlesprouts, 2, 2, 20012002 countya, busybeez, 98, 98, 20022003 countya, childtime, 99, 99, 20022003 countyb, kidzplace, 5, 4 , 20022003 countyb, littlesprouts, 2, 2, 20022003 ; run;
Since my actual output will have 30 columns across (as opposed to the 2 across below, one column for each year), and
will have about 60 rows of counties, what is the most efficient way to get the desired output (below)?
I've tried the following but my column headers aren't labeled year so I'm not sure I'm getting what I'm expecting.
proc summary data=new order=data nway; class county year; var total var1; output out=news (drop=_type_ _freq_) sum=total var1; run; data percent (keep=county year pct_var1); set news; pct_var1=(var1/total)*100; run; proc sort data=percent out=percents; by county year; run; proc transpose data=percents out=var1byyear; var pct_var1 ; by county; run;
Desired Output into an Excel Spreadsheet
Var1 by county by year
Year 20012002 20022003 County Pct Var1 Pct Var1 countya 99.5 100.0 countyb 80.0 85.7
Try
proc tabulate data=percent; class county year; var pct_var1; table county, year*pct_var1*max=''*f=f5.1 ; run;
Use ODS destination to write to Excel.
Thank you!! I tried searching for the following in documentation for an explanation but wasn't able to find it to understand what exactly it does and when to use it again in the proc tabulate statement. Any help is appreciated: *max=''*
@jcis7 wrote:
Thank you!! I tried searching for the following in documentation for an explanation but wasn't able to find it to understand what exactly it does and when to use it again in the proc tabulate statement. Any help is appreciated: *max=''*
The body of the Proc tabulate result requires some sort of numeric statistic. So when the value is pre-calculated as you did to get the percentage you need to provide one. Any of Max, Min or mean would yield the same result.
Probably could of done the same with proc report with the Year variable as an across but I don't use Report much and would have taken me longer to get the same result.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.