BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rogerward
Obsidian | Level 7

Good People -

 

A sample program is shown below with two PROC SQLs.  I would like to know how I can combine the second PROC SQL, which uses MAX and GROUP  BY, with the first PROC SQL, which establishes a series of new variables.

 

Thanks a lot!

 

data got;
input id question response;
datalines;
1 1 0
1 2 2
1 3 1
1 4 0
2 1 1
2 2 2
2 3 3
2 4 3
3 1 0
3 2 0
3 3 0
3 4 1
;
run;
proc print data=got;
run;

 

proc sql;
create table one as
select id
,case when question=1 then response else 0 end as question_1
,case when question=2 then response else 0 end as question_2
,case when question=3 then response else 0 end as question_3
,case when question=4 then response else 0 end as question_4
from got
order by id;
quit;
proc print data=one;
run;


proc sql;
create table two as
select id
,max(question_1) as m1
,max(question_2) as m2
,max(question_3) as m3
,max(question_4) as m4
from one
group by id;
quit;
proc print data=two;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

In another thread, I mentioned (not to you @rogerward , but in general) that people should not choose SQL as their first choice for performing statistical or mathematical calculations (in this case, the maximum of many values). This is a great example why. This is a lot of typing, and if you have more than 4 questions, the typing increases.

 

On the other hand, if you use PROC SUMMARY (or PROC MEANS), this is 5 lines of code, regardless of how many questions there are. You see, SAS has already programmed most common statistical analyses, and it has programmed the ability to handle many (arbitrary) levels of a variable; but in SQL you have to write your own code. If all you ever have is 4 questions, you can probably get away with doing it in SQL (the answers will be the same if you do it right), but if you have 40 questions, SQL would be a nightmare.

 

proc summary data=got nway;
    class id question;
    var response;
    output out=want max=;
run;

 

--
Paige Miller

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

In another thread, I mentioned (not to you @rogerward , but in general) that people should not choose SQL as their first choice for performing statistical or mathematical calculations (in this case, the maximum of many values). This is a great example why. This is a lot of typing, and if you have more than 4 questions, the typing increases.

 

On the other hand, if you use PROC SUMMARY (or PROC MEANS), this is 5 lines of code, regardless of how many questions there are. You see, SAS has already programmed most common statistical analyses, and it has programmed the ability to handle many (arbitrary) levels of a variable; but in SQL you have to write your own code. If all you ever have is 4 questions, you can probably get away with doing it in SQL (the answers will be the same if you do it right), but if you have 40 questions, SQL would be a nightmare.

 

proc summary data=got nway;
    class id question;
    var response;
    output out=want max=;
run;

 

--
Paige Miller
rogerward
Obsidian | Level 7

Thanks for your creativity.  I had been using PROC SUMMARY and PROC MEANS without issue. In my search for tricks to do the same task using multiple methods, I was pondering how to accomplish the same using PROC SQL, and was mystified.  I thought I was missing something.  PROC SQL is appealing since many times it executes faster.

PaigeMiller
Diamond | Level 26

I'm surprised when you say SQL executes faster than PROC MEANS/PROC SUMMARY or PROC TRANSPOSE. That hasn't been my experience. I guess it depends.

 

But, as I pointed out, coding time is much shorter using PROC MEAN/PROC SUMMARY or PROC TRANSPOSE. As a general rule, statistical or mathematical functions ought to be computed using built-in SAS functions or PROCs, instead of coding it up yourself in SQL. (I have seen people try to code their own functions in SQL or a DATA step, instead of using built-in methods, and they get it wrong, even for something as simple as an average)

--
Paige Miller
PaigeMiller
Diamond | Level 26

@rogerward wrote:

PROC SQL is appealing since many times it executes faster.


In this case, I used 50000 IDs (instead of your 3 records), and 40 questions (instead of your 4 questions), and ran the one SQL solution from @Tom

 

NOTE: PROCEDURE SQL used (Total process time):
real time 2.28 seconds
user cpu time 2.62 seconds
system cpu time 0.64 seconds
 
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 1.62 seconds
user cpu time 1.62 seconds
system cpu time 0.00 seconds

--
Paige Miller
PaigeMiller
Diamond | Level 26

Also, it appears to me that all you have accomplished by all this SQL code is to transpose your data set. If so, use PROC TRANSPOSE! Again the benefits are that if you get it to work for 4 questions, it also works in the case that you have 40 questions, with no extra coding; and even in the case of 4 questions it is a lot less coding than your SQL.

 

proc transpose data=got out=want prefix=m;
    by id;
    id question;
	var response;
run;

 

--
Paige Miller
Tom
Super User Tom
Super User

Just take the MAX() of the value of the CASE clause.

proc sql;
create table two as
  select id
       , max(case when question=1 then response else 0 end) as m_1
       , max(case when question=2 then response else 0 end) as m_2
       , max(case when question=3 then response else 0 end) as m_3
       , max(case when question=4 then response else 0 end) as m_4
  from got
  group by id
  ;
quit;

But for your actual data just use PROC TRANSPOSE, since there is only one response per id per question.

proc transpose data=got out=three(drop=_name_) prefix=m_;
  by id;
  id question;
  var response;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2982 views
  • 2 likes
  • 3 in conversation