BookmarkSubscribeRSS Feed
SAS_dj1999
Calcite | Level 5

I'm fairly new to SAS and I'd like to really get into returning data sets in the most efficient and economical fashion.  My lack of on the job experience leads me to the following.  If a client requested the have the following metrics for mean, count and sum by dpt (department), what would be the best approach to take?

data sample1;

infile datalines dlm=",";

length fn $ 12;

input fn $ dpt $ salary;

datalines;

john doe, fin, 50000

jane doe, fin, 69393

kim smith, it, 93893

ken smith, it, 80493

;

run;

quit;

something as simple as:

proc means data=sample1 sum n mean;

class dpt;

run;

quit;

or would we write PROC SQL statement or even a data set process where we're grouping to get our totals.  The Proc Means approach is really quick but it doesn't push the boundaries of being a good programmer.  With an approach like

data mt_samp (drop=fn);

  set sample1 ;

by dpt;

if First.dpt then total = 0;

if First.dpt then cnt = 0;

if First.dpt then ave = total /cnt;  /** this line doesn't work, sorry

  total+salary;

  cnt+1;

if Last.dpt then output;

run;

quit;

Thanks,

SJ         

5 REPLIES 5
ghastly_kitten
Fluorite | Level 6

Ghm... As I understood from your post, you don't really need help on calculating metrics:

Proc means does it perfectly.

If you need to put the result of means procedure in a dataset, you need just to add a output statement into procedure:

proc means data=sample1 noprint ;

class dpt;

output out=sample_metrics sum=SumByDept n=NumOfEntries mean=MeanByDept;

run;

quit;

you can add other metrics in output statement.

p.s.

If you want to find the other way: use proc sql. It's also possible with data step, but it's not really convenient for generic task, since you need to perform sorting operations outside of the step and have to control auxilliary variables for aggregating your stats.

ArtC
Rhodochrosite | Level 12

'push the boundaries of being a good programmer'....  Being a good programmer does NOT mean that the most complicated solution should be used.  A good programmer understands the range of possible programming solutions and picks the best one for the task.  The better the programmer the broader the range of choices and the better the understanding of all the ramifications of that choice.

In these forums you often see the range of solutions to problems that reflect these levels of understanding.  At SAS Global Forum next month I will be presenting a paper that addresses some of the issues of those choices and the topics came from these forums.

TomKari
Onyx | Level 15

Using PROC MEANS does in fact push the boundaries of being a good programmer. After 30 plus years in this business, there's no better definition of the job of a programmer than "to cost-effectively meet the requirements of the client".

In your example, the following advantages are derived from using PROC MEANS:

  • It's the shortest time to code, which means lower cost for your client (on an hourly rate, meaning they'll be happy to hire you again), or more money in your pocket on a fixed-rate basis.
  • You can assume that SAS does the statistical calculations correctly. If you code the statistics yourself, you risk minor coding errors, therefore you must devote MUCH more time to testing...back to the lower time-to-delivery factor.
  • You can assume that SAS (a $2.7B company) expends considerable R&D resources making its software as efficient and easy to use as possible. By using their PROC, you piggyback on all of these advances.

In addition, at some point you depart, and I have to come in to maintain or modify your code. If you were in that situation, which program would you rather have to investigate? Again, your reputation matters. Would you rather your successor report to management..."Boy, that SJ. Did he ever do a nice job on the summarization code" or "Jeez, that SJ. He could have just used PROC MEANS, but instead we're saddled with a bunch of DATA step code. He doesn't know ANYTHING about SAS". Or even worse..."That cretin SJ: he could have used PROC MEANS, but instead he wrote some idiotic DATA step code that doesn't work right. Oh, by the way, you're going to have to restate your financials for the last five years".

So, as a "manager/professional" response, you should absolutely, definitely use PROC MEANS. However, your other point is valid. To add value as a developer (NOT a programmer), you will need a very large stock of methods and techniques to succeed. Situations will arise where you'll need to recommend an approach, and you'll have a big blank sheet of paper in front of you. How will you know how to proceed? First, you'll build up a repertoire of knowledge from just doing your job in the most efficient way possible. Second, talk to people (like your post on this forum...way to go!). Third, you'll need to invest in both formal and self-directed training. I would frequently stay a couple of hours after work (on my own time), when it's quiet, and investigate different alternatives, or try some new features of SAS that I didn't understand fully. That way, when it comes up in a meeting, I'm prepared to offer a recommendation backed by experience, where the other folks simply don't know.

Remember, a career is a very long time. I started in this business before IBM released the Personal Computer, and have had to re-invent myself a half dozen time. But when things go well, nothing is more satisfying!

Best,

  Tom

SAS_dj1999
Calcite | Level 5

Tom,

Thanks for the response.  I was looking for some insight on the life of a SAS Analyst/Developer and you definitely brought the sun to this thread!

Thank you all,

SJ

Astounding
PROC Star

Agree with all the other posters.  As you think about the problem, also consider that you are not that far away from requiring more of a strategy.  For example, let's make two assumptions to complicate the task just a little.  First, assume you have a lot of data so that efficiency is important.  And second, you don't have just a single variable SALARY, but a second variable BONUS.  And you have to get the average of SALARY + BONUS.  

The inefficient method would use a DATA step to add SALARY + BONUS, then use PROC MEANS on that total.  That's inefficient because it processes a large amount of data twice.  The efficient method would run PROC MEANS on the individual variables:

VAR SALARY BONUS;

It would save an output data set holding a separate observation for each DPT (rather than printing) and save the SUM statistic.  Then a DATA step could combine the sum of SALARY with the sum of BONUS.  The same DATA step could divide by _FREQ_ (automatically part of the output data set), to get averages.  Finally, print the results.  Remember, it is very fast to process a summary data set multiple times.  All in all, if you're considering what to learn next, it wouldn't hurt to learn the structure of a PROC MEANS / PROC SUMMARY output data set.

Good luck.


sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 910 views
  • 4 likes
  • 5 in conversation