BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jcis7
Pyrite | Level 9

Hello!

 

I have included sample data but my actual data has more than 200,000 observations and the year variable spans about 30 years  and ~60 counties.

I've tried proc report with this to summarize but the coding for 60 counties is cumbersome and SAS actually doesn't like the large number of observations.

Is there a more a more efficient way to do what I'm trying to do? Thank you!

 

Sample Data:

 

data new;
infile DATALINES dsd missover;
input county $ facility $ total var1 year $;
CARDS;
countya, busybeez, 100, 99, 20012002
countya, childtime, 90, 90, 20012002
countyb,  kidzplace, 3, 2 , 20012002
countyb, littlesprouts, 2, 2, 20012002
countya, busybeez, 98, 98, 20022003
countya, childtime, 99, 99, 20022003
countyb,  kidzplace, 5, 4 , 20022003 
countyb, littlesprouts, 2, 2, 20022003
;
run;

 

Since my actual output will have 30 columns across (as opposed to the 2 across below, one column for each year), and

will have about 60 rows of counties, what is the most efficient way to get the desired output (below)?

 

I've tried the following but my column headers aren't labeled year so I'm not sure I'm getting what I'm expecting. 

 

proc summary data=new order=data nway;
    class county year;  
    var total var1; 
    output out=news (drop=_type_ _freq_) sum=total var1;
      
  run;



data percent (keep=county year pct_var1);
  set news;  
pct_var1=(var1/total)*100;

run;


proc sort data=percent out=percents;
 by county year;
run;

proc transpose data=percents
                  out=var1byyear;

  var pct_var1  ;
  by county;
run;
 

 

 

 

 

Desired Output into an Excel Spreadsheet

Var1 by county by year

 

  Year
  20012002 20022003
County Pct Var1 Pct Var1
countya 99.5 100.0
countyb 80.0 85.7
     
     
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@jcis7 wrote:

Hello!

 

I have included sample data but my actual data has more than 200,000 observations and the year variable spans about 30 years  and ~60 counties.

I've tried proc report with this to summarize but the coding for 60 counties is cumbersome and SAS actually doesn't like the large number of observations.

Is there a more a more efficient way to do what I'm trying to do? Thank you!

 

Sample Data:

 

data new;
infile DATALINES dsd missover;
input county $ facility $ total var1 year $;
CARDS;
countya, busybeez, 100, 99, 20012002
countya, childtime, 90, 90, 20012002
countyb,  kidzplace, 3, 2 , 20012002
countyb, littlesprouts, 2, 2, 20012002
countya, busybeez, 98, 98, 20022003
countya, childtime, 99, 99, 20022003
countyb,  kidzplace, 5, 4 , 20022003 
countyb, littlesprouts, 2, 2, 20022003
;
run;

 

Since my actual output will have 30 columns across (as opposed to the 2 across below, one column for each year), and

will have about 60 rows of counties, what is the most efficient way to get the desired output (below)?

 

I've tried the following but my column headers aren't labeled year so I'm not sure I'm getting what I'm expecting. 

 

proc summary data=new order=data nway;
    class county year;  
    var total var1; 
    output out=news (drop=_type_ _freq_) sum=total var1;
      
  run;



data percent (keep=county year pct_var1);
  set news;  
pct_var1=(var1/total)*100;

run;


proc sort data=percent out=percents;
 by county year;
run;

proc transpose data=percents
                  out=var1byyear;

  var pct_var1  ;
  by county;
run;
 

 

 

 

 

Desired Output into an Excel Spreadsheet

Var1 by county by year

 

  Year
  20012002 20022003
County Pct Var1 Pct Var1
countya 99.5 100.0
countyb 80.0 85.7
     
     

Try

proc tabulate data=percent;
   class county year;
   var pct_var1;
   table county,
         year*pct_var1*max=''*f=f5.1
   ;
run;

Use ODS destination to write to Excel.

 

View solution in original post

4 REPLIES 4
ballardw
Super User

@jcis7 wrote:

Hello!

 

I have included sample data but my actual data has more than 200,000 observations and the year variable spans about 30 years  and ~60 counties.

I've tried proc report with this to summarize but the coding for 60 counties is cumbersome and SAS actually doesn't like the large number of observations.

Is there a more a more efficient way to do what I'm trying to do? Thank you!

 

Sample Data:

 

data new;
infile DATALINES dsd missover;
input county $ facility $ total var1 year $;
CARDS;
countya, busybeez, 100, 99, 20012002
countya, childtime, 90, 90, 20012002
countyb,  kidzplace, 3, 2 , 20012002
countyb, littlesprouts, 2, 2, 20012002
countya, busybeez, 98, 98, 20022003
countya, childtime, 99, 99, 20022003
countyb,  kidzplace, 5, 4 , 20022003 
countyb, littlesprouts, 2, 2, 20022003
;
run;

 

Since my actual output will have 30 columns across (as opposed to the 2 across below, one column for each year), and

will have about 60 rows of counties, what is the most efficient way to get the desired output (below)?

 

I've tried the following but my column headers aren't labeled year so I'm not sure I'm getting what I'm expecting. 

 

proc summary data=new order=data nway;
    class county year;  
    var total var1; 
    output out=news (drop=_type_ _freq_) sum=total var1;
      
  run;



data percent (keep=county year pct_var1);
  set news;  
pct_var1=(var1/total)*100;

run;


proc sort data=percent out=percents;
 by county year;
run;

proc transpose data=percents
                  out=var1byyear;

  var pct_var1  ;
  by county;
run;
 

 

 

 

 

Desired Output into an Excel Spreadsheet

Var1 by county by year

 

  Year
  20012002 20022003
County Pct Var1 Pct Var1
countya 99.5 100.0
countyb 80.0 85.7
     
     

Try

proc tabulate data=percent;
   class county year;
   var pct_var1;
   table county,
         year*pct_var1*max=''*f=f5.1
   ;
run;

Use ODS destination to write to Excel.

 

jcis7
Pyrite | Level 9

Thank you!!  I tried searching for the following in documentation for an explanation but wasn't able to find it to understand what exactly it does and when to use it again in the proc tabulate statement. Any help is appreciated:  *max=''*


ballardw
Super User

@jcis7 wrote:

Thank you!!  I tried searching for the following in documentation for an explanation but wasn't able to find it to understand what exactly it does and when to use it again in the proc tabulate statement. Any help is appreciated:  *max=''*



The body of the Proc tabulate result requires some sort of numeric statistic. So when the value is pre-calculated as you did to get the percentage you need to provide one. Any of Max, Min or mean would yield the same result.

 

Probably could of done the same with proc report with the Year variable as an across but I don't use Report much and would have taken me longer to get the same result.

jcis7
Pyrite | Level 9
Thank you!!

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1400 views
  • 4 likes
  • 2 in conversation