Help using Base SAS procedures

Best way for calculations

Reply
Regular Contributor
Posts: 157

Best way for calculations

Hi,

I want to know which is the best way for calculations.

For example:

If i have a situtation where i have to take counts grouping by state and then grouping by country then i will use PROC SQL.once i will do it with state and then i will do it with country.

As a result it will take two steps

PROC SQL;

  CREATE TABLE STATE_POP AS

SELECT *,COUNT(X) AS STATE_COUNT

FROM  TEST

GROUP BY STATE

QUIT;

PROC SQL;

  CREATE TABLE STATE_POP AS

SELECT *,COUNT(X) AS COUNTRY_COUNT

FROM  TEST

GROUP BY COUNTRY

QUIT;

Instead of doing it as two step process.Is there any other way i can do all the calculations in one.

Here i am showing one of my scenario.

My main question is if i have multiple calculations and along with calculations i want to have some other information in the data set.

Which is the best procedure to use so that it will minimize the steps.

Occasional Contributor
Posts: 9

Best way for calculations

Country? or County?

Not all Countries have "states"

I have used nested sql in the past

proc sql;

  create table populations as

  select *, count(x) as country_count

  from (select * , count(x) as state_count

             from test

            group by country , state

         )

  group by country

  ;

quit;

You can also use proc summary to accomplish the same thing;

Off the top of my head, I think the following is approximately correct.

proc summary data=test;

  by country state;

  var country state ;

  output out=populations N(country,state)= ;

run;

There may be need for an "nway" option.

The resulting data set has some additional columns so that you can get the combined result set of both counts.

Valued Guide
Posts: 634

Re: Best way for calculations

Posted in reply to ChuckMoore

A refinement on Chuck's code. 

proc summary data=sashelp.shoes nway;

class region subsidiary;

output out=counts ;

run;

proc print data=counts;

run;

Use the CLASS (no sorting required), no var statement (summary counts by default), the NWAY gives counts of the interaction.  Remove the NWAY to get the counts of higher summary levels.

Respected Advisor
Posts: 2,655

Best way for calculations

A comment:

Lately there seems to be a lot of questions regarding using PROC SQL to do things that there are other PROCS that are specifically written to accomplish (PROC FREQ, PROC MEANS/SUMMARY, PROC REPORT, etc.), or would be better done inside a DATA step.  Maybe I am too old school, but I just wonder why PROC SQL seems to be the default.

Steve Denham

PROC Star
Posts: 7,467

Best way for calculations

Posted in reply to SteveDenham

Steve,

I've seen two sets of rationale.  One, a number of new SAS users who feel more comfortable and experienced with SQL.  Two, some of our GUI-oriented relatives who, for whatever reason, think they can only apply the needed manipulations/analyses only or more easily with SQL.

Occasional Contributor
Posts: 9

Re: Best way for calculations

Posted in reply to SteveDenham

It's an educational thing.  Students are taught SQL and have to use EXCEL.  When I took the "Statistics using SAS" courses in College, I knew way more about SAS then the instructors.  Even NC state's SAS programs (where SAS was created by Mr. Goodnight at the time) are declining.

Here's the fundamental problems that I've seen over the years:

1)  SAS is a data centric language, whereas all other programming languages were machine control centric, except COBOL.  Today, kids are taught Java, C++/C#,  .NET, and web related languages = HTML, X... , perhaps PERL, PHP.  And most compute environments are PC based because they are "cheap", every student has their own PC, they are required to do so so that they can have email and MS Word for their English and History papers.  Java is object oriented and free, SAS is not.  Oracle, SQL server, and MySQL are viewed as real databases, SAS is not.

2)  Most SAS "programmers" I've met are not computer science majors, but Psychologists, Economists, Mathematicians, even English Majors.  So, they have not had any real programming classes and break all the old rules and programming conventions.

3)  Today students have to use Excel in school and thus become very efficient at Excel Pivot tables.  This is fine for small stuff, and makes us older people who predate Excel "dinosaurs" who are very proficient at analyzing large data sets with SAS and can only stumble around with Excel Pivot tables.

4)  SAS classes by SAS, just like Oracle classes by Oracle, Cisco classes, etc. are short and expensive.  So, few people take these classes, and do so only to help them get "certified".

5)  Too many people these days learn and are taught by rote, not critical creative thinking.

SAS documentation used to be great !  It's not any more.

Solutions:

Dr. Goodnight should find a way to make a greater emphasis on education, both in quantity and quality.  Just because a person has a PhD in Clinical Psychology and has used some SAS jobs to do their study statistics, should not qualify them to teach Statistics using SAS -- one of my college classes.

Professional education of SAS by SAS should be less expensive per class, and then provide more classes.  He should start a true SAS Institute/College, that teaches math, statistics, analysis, etc. offering degrees in Applied Mathematics,  Econometrics,  Psychometrics, etc. and certificates related to SAS programming, and various industry related subjects.

This is only the tip of my iceberg.

I'll get off my soap box now.

! SAS is the greatest data processing language/system on the planet !

Super User
Posts: 10,018

Re: Best way for calculations

Posted in reply to SteveDenham

Steve.

Maybe the reason is SQL is a industry standard which has been applied everywhere at IT field.

Ksharp

Respected Advisor
Posts: 2,655

Best way for calculations

Since it seems to be soapbox time:

But saying that it is an industry standard is (to me) like saying there is a tool (say a screwdriver) that I can use to drive screws, open paint cans, chip rust, and turn around to use as a hammer, then I should only learn how to use a screwdriver in new ways, rather than noticing that on the toolbench, that I paid a lot of money for, are a paint lid key, a chisel and a true hammer.  I already own them--why not use them?

If I were going to do everything in SQL, I'd do it in something relatively native to the database at hand.  At least, that is the thought of our local IT group.

I like Chuck's answer--it is a matter of education, and frankly, sometimes there just isn't time in a production environment to start learning something completely new.  I understand that.

Steve Denham

Super User
Posts: 5,495

Best way for calculations

If you try this approach, you may notice that you should consider changing your original question.  You started out asking how to achieve your goal with a minimum number of steps.  Instead, consider how to achieve your goal with the fastest-running program.  They're not really the same thing.

You'll have to learn, if you want to do this.  In this case, you'll need to examine the structure of the output data set from PROC SUMMARY.  The strategy (using your original variable names) would be to summarize your large data set once, getting counts for each STATE/COUNTY combination.  Save the output data set from that summary.  Re-summarize it later to get counts for each STATE, or counts for each COUNTY.  That way, you end up processing the large data set once, and processing the smaller summary data set multiple times.  The program may be longer, but it will run faster if your summary is considerably smaller than the original. 

If you adopt this approach, you may need to change the statistics you save in the summary data set.  If you want to end up with means, you may save the N and SUM statistics in your summary data set.  Aggregate those later, and then compute an aggregated mean.  In similar fashion, but with a more complex formula, you can save the sum of the squared values in your output data set.  The standard deviation can be computed later, using aggregated versions of the sum of squared values, plus the N and SUM statistics.  SAS Press publishes a book on efficiency in common programming situations (author is Bob Virgile).

Good luck.

Regular Contributor
Posts: 157

Best way for calculations

wow! I really appreciate all of you for taking  time to answer my question.I agree with you astounding i need to rephrase my question to which is more efficient.

I am habituated more to SQl as it looks simple and as Ksharp says it is mostly used in IT field, but as steve said there are so many nice procedures which can be used to do the job.i will try my best to come out of that habit and learn new habits.

Once again thanks to all of you for your valable suggestions!

Ask a Question
Discussion stats
  • 9 replies
  • 351 views
  • 0 likes
  • 7 in conversation