02-06-2012 12:11 PM
I want to know which is the best way for calculations.
If i have a situtation where i have to take counts grouping by state and then grouping by country then i will use PROC SQL.once i will do it with state and then i will do it with country.
As a result it will take two steps
CREATE TABLE STATE_POP AS
SELECT *,COUNT(X) AS STATE_COUNT
GROUP BY STATE
CREATE TABLE STATE_POP AS
SELECT *,COUNT(X) AS COUNTRY_COUNT
GROUP BY COUNTRY
Instead of doing it as two step process.Is there any other way i can do all the calculations in one.
Here i am showing one of my scenario.
My main question is if i have multiple calculations and along with calculations i want to have some other information in the data set.
Which is the best procedure to use so that it will minimize the steps.
02-06-2012 12:38 PM
Country? or County?
Not all Countries have "states"
I have used nested sql in the past
create table populations as
select *, count(x) as country_count
from (select * , count(x) as state_count
group by country , state
group by country
You can also use proc summary to accomplish the same thing;
Off the top of my head, I think the following is approximately correct.
proc summary data=test;
by country state;
var country state ;
output out=populations N(country,state)= ;
There may be need for an "nway" option.
The resulting data set has some additional columns so that you can get the combined result set of both counts.
02-07-2012 01:47 AM
A refinement on Chuck's code.
proc summary data=sashelp.shoes nway;
class region subsidiary;
output out=counts ;
proc print data=counts;
run;Use the CLASS (no sorting required), no var statement (summary counts by default), the NWAY gives counts of the interaction. Remove the NWAY to get the counts of higher summary levels.
02-07-2012 07:43 AM
Lately there seems to be a lot of questions regarding using PROC SQL to do things that there are other PROCS that are specifically written to accomplish (PROC FREQ, PROC MEANS/SUMMARY, PROC REPORT, etc.), or would be better done inside a DATA step. Maybe I am too old school, but I just wonder why PROC SQL seems to be the default.
02-07-2012 08:20 AM
I've seen two sets of rationale. One, a number of new SAS users who feel more comfortable and experienced with SQL. Two, some of our GUI-oriented relatives who, for whatever reason, think they can only apply the needed manipulations/analyses only or more easily with SQL.
02-07-2012 08:53 AM
It's an educational thing. Students are taught SQL and have to use EXCEL. When I took the "Statistics using SAS" courses in College, I knew way more about SAS then the instructors. Even NC state's SAS programs (where SAS was created by Mr. Goodnight at the time) are declining.
Here's the fundamental problems that I've seen over the years:
1) SAS is a data centric language, whereas all other programming languages were machine control centric, except COBOL. Today, kids are taught Java, C++/C#, .NET, and web related languages = HTML, X... , perhaps PERL, PHP. And most compute environments are PC based because they are "cheap", every student has their own PC, they are required to do so so that they can have email and MS Word for their English and History papers. Java is object oriented and free, SAS is not. Oracle, SQL server, and MySQL are viewed as real databases, SAS is not.
2) Most SAS "programmers" I've met are not computer science majors, but Psychologists, Economists, Mathematicians, even English Majors. So, they have not had any real programming classes and break all the old rules and programming conventions.
3) Today students have to use Excel in school and thus become very efficient at Excel Pivot tables. This is fine for small stuff, and makes us older people who predate Excel "dinosaurs" who are very proficient at analyzing large data sets with SAS and can only stumble around with Excel Pivot tables.
4) SAS classes by SAS, just like Oracle classes by Oracle, Cisco classes, etc. are short and expensive. So, few people take these classes, and do so only to help them get "certified".
5) Too many people these days learn and are taught by rote, not critical creative thinking.
SAS documentation used to be great ! It's not any more.
Dr. Goodnight should find a way to make a greater emphasis on education, both in quantity and quality. Just because a person has a PhD in Clinical Psychology and has used some SAS jobs to do their study statistics, should not qualify them to teach Statistics using SAS -- one of my college classes.
Professional education of SAS by SAS should be less expensive per class, and then provide more classes. He should start a true SAS Institute/College, that teaches math, statistics, analysis, etc. offering degrees in Applied Mathematics, Econometrics, Psychometrics, etc. and certificates related to SAS programming, and various industry related subjects.
This is only the tip of my iceberg.
I'll get off my soap box now.
! SAS is the greatest data processing language/system on the planet !
02-08-2012 08:10 AM
Since it seems to be soapbox time:
But saying that it is an industry standard is (to me) like saying there is a tool (say a screwdriver) that I can use to drive screws, open paint cans, chip rust, and turn around to use as a hammer, then I should only learn how to use a screwdriver in new ways, rather than noticing that on the toolbench, that I paid a lot of money for, are a paint lid key, a chisel and a true hammer. I already own them--why not use them?
If I were going to do everything in SQL, I'd do it in something relatively native to the database at hand. At least, that is the thought of our local IT group.
I like Chuck's answer--it is a matter of education, and frankly, sometimes there just isn't time in a production environment to start learning something completely new. I understand that.
02-08-2012 09:18 AM
If you try this approach, you may notice that you should consider changing your original question. You started out asking how to achieve your goal with a minimum number of steps. Instead, consider how to achieve your goal with the fastest-running program. They're not really the same thing.
You'll have to learn, if you want to do this. In this case, you'll need to examine the structure of the output data set from PROC SUMMARY. The strategy (using your original variable names) would be to summarize your large data set once, getting counts for each STATE/COUNTY combination. Save the output data set from that summary. Re-summarize it later to get counts for each STATE, or counts for each COUNTY. That way, you end up processing the large data set once, and processing the smaller summary data set multiple times. The program may be longer, but it will run faster if your summary is considerably smaller than the original.
If you adopt this approach, you may need to change the statistics you save in the summary data set. If you want to end up with means, you may save the N and SUM statistics in your summary data set. Aggregate those later, and then compute an aggregated mean. In similar fashion, but with a more complex formula, you can save the sum of the squared values in your output data set. The standard deviation can be computed later, using aggregated versions of the sum of squared values, plus the N and SUM statistics. SAS Press publishes a book on efficiency in common programming situations (author is Bob Virgile).
02-09-2012 01:26 PM
wow! I really appreciate all of you for taking time to answer my question.I agree with you astounding i need to rephrase my question to which is more efficient.
I am habituated more to SQl as it looks simple and as Ksharp says it is mostly used in IT field, but as steve said there are so many nice procedures which can be used to do the job.i will try my best to come out of that habit and learn new habits.
Once again thanks to all of you for your valable suggestions!