BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Musenol
Fluorite | Level 6

Hello all,

 

I want to write proc sql for find first and last row for some operations with group by two variable. I found the SAS example in the https://blogs.sas.com/content/iml/2018/02/26/how-to-use-first-variable-and-last-variable-in-a-by-gro... . But I couldn't find the proc sql version.

 

How would I approach this?

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Looks like you are just aggregating three variables over groups defined by the values of ID and TYPE.

So the normal SAS syntax would be something like:

proc summary data=group_sort nway;
  by id type;
  var credit_before_semester  credit_after_semester semester_credit;
  output out=group_1
         n=type_count
         sum=sum_credit_before_semester sum_credit_after_semester sum_semester_credit
  ;
run;

Or you could use PROC SQL.

proc sql;
create table group_1 as
select id,type
     , count(*) as type_count
     , sum(credit_before_semester) as sum_credit_before_semester
     , sum(credit_after_semester) as sum_credit_after_semester 
     , sum(semester_credit) as sum_semester_credit
group by id,type
;
quit;

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

There is no such thing as FIRST.variable and LAST.variable in PROC SQL. You should do this in a DATA step.

--
Paige Miller
Musenol
Fluorite | Level 6
I know that FIRST.variable and LAST.variable are not for PROC SQL. I have seen PROC SQL codes that are similar to PROC TRANSPOSE and I would like to code just like that for FIRST.variable and LAST.variable. I wanted to ask just how I could approach this issue
PaigeMiller
Diamond | Level 26

And I will repeat ... the easiest way and the best way to do this is in a DATA step.

--
Paige Miller
Astounding
PROC Star

PROC SQL does not guarantee the order of the observations.  Therefore, which one is first within a group (and which one is last within a group) is subject to change, and not reliable.

 

If you really must use SQL, you will have to supply a more specific transpose problem.  Wide to long?  Long to wide?  How many variables involved?  Give a specific example of what you would like to accomplish.  But as you have already been told, SQL is the wrong tool for the job.

Ksharp
Super User

As Paige said, the best tool is data step,NOT sql.

Anyway, there is some sql code could get first last. But I don't like it.

 

proc sort data=sashelp.class out=have;by sex;run;

ods select none;
ods output sql_results=sql_results;
proc sql number;
select * from have;
quit;
ods select all;

proc sql;
create table want as
select * from sql_results
 group by sex
  having row=min(row) or row=max(row);
quit; 

 

 

Musenol
Fluorite | Level 6

My Table has 12k columns and 72 rows table sorted by id and type.

 

If table grouped by single variable, i can be use monotonic for min and max row. But table grouped by two variable. So, i can not use monotonic for min and max for groups.

 

My table like:

 

ID            TYPE      NAME     GPA       CREDIT   FACULTY       DEPARTMANT   SCHOLARSHIP    CLUB_FLAG        …

13416        T1       Micheal     3.00         93.00     ENG. FAC.      COMP.ENG.                 1                          0                  …

13416        T3       Micheal     3.00         93.00     ENG. FAC.      COMP.ENG.                 0                          0                  …

13417        T2       Sam          1.68        123.00    ENG. FAC.      COMP.ENG.                 1                          0                  …

13418        T1       Keanu       3.51        135.00    ENG. FAC.      COMP.ENG.                 1                          0                  …

13418        T2       Keanu       3.51        135.00    ENG. FAC.      COMP.ENG.                 0                          1                  …

13418        T4       Keanu       3.51        135.00    ENG. FAC.      COMP.ENG.                 0                          1                  …

13419        T3       Ann           1.85        84.00      ENG. FAC.      COMP.ENG.                 0                          0                  …

13420        T4       George     2.10        27.00      ENG. FAC.      COMP.ENG.                 0                          0                  …

13421        T1       Jennifer    0.41        112.00    ENG. FAC.      COMP.ENG.                  1                         1                  …

13421        T2       Jennifer    0.41        112.00    ENG. FAC.      COMP.ENG.                  0                         1                 …

13422        T1       Ann          2.13        37.00      ENG. FAC.      COMP.ENG.                  0                         1                 …

13422        T2       Ann          2.13        37.00      ENG. FAC.      COMP.ENG.                  0                         0                 …

13422        T3       Ann          2.13        37.00      ENG. FAC.      COMP.ENG.                  0                         0                 …

13422        T4       Ann          2.13        37.00      ENG. FAC.      COMP.ENG.                  1                         1                 …

…              …         …             …             …                     …             …                            …                        …                …

 

 

Thank you for your answers @PaigeMiller @Astounding @Ksharp . I have to writed by PROC SQL.

Tom
Super User Tom
Super User

Why do you have to use PROC SQL?

Is it that you are actually pushing the SQL code into some external database?  If so then you should check if the external database's implementation of SQL has features that might help.

Is it that you are using some tool only can generate SQL code? What tool?

If it is because someone told you to use PROC SQL, then educate them that it is best to use the right tool for the job and SQL is NOT the tool for any job that is dependent on an ordering of observations that is not based on the values of the variables in the data.

 

Add the sequence number first using a data step ( or data step view) and then you can use the sequence number to detect FIRST (seq=1) and LAST (seq=max(seq).

data for_sql_step;
   set have;
   by id1 id2;
   if first.id2 then seq=0;
   seq+1;
run;
Reeza
Super User
What is the use case? FIRST/LAST are SAS data step constructs and often what they're used for can be accomplished another way in SQL though not as easily. So if you can explain overall what you're trying to do that will help.
Ksharp
Super User

OK. It is not a big deal. If you have two or more variables need sorted ,try this one :

 

ods select none;
ods output sql_results=sql_results;
proc sql number;
select * from sashelp.heart   /*the dataset you have*/
 order by status,sex ;   /*two variables need to sorted*/
quit;
ods select all;

proc sql;
create table want as
select * from sql_results
 group by status,sex
  having row=min(row) or row=max(row);
quit; 
Musenol
Fluorite | Level 6

I'm using Sas Ondemand. Actually my task is kind of like a homework, which is to convert SAS to PROC SQL. The outputs of the SAS must match the outputs of the SQL. 

 

You can see the SAS block below. I am new in Community so I apologize if I can't articulate very well what I need in this question page - thanks in advance for all the help!

 

data Group_1;
set group_sort;
by id type;
if first.type then do;
sum_credit_before_semester=0;
sum_credit_after_semester=0;
sum_semester_credit;
type_COUNT = 0;
end;
sum_credit_before_semester + credit_before_semester ;
sum_credit_after_semester+credit_after_semester;
sum_semester_credit + semester_credit;
type_COUNT+1;
if last.type;
drop
credit_before_semester credit_after_semester semester_credit
run;

 

Tom
Super User Tom
Super User

Looks like you are just aggregating three variables over groups defined by the values of ID and TYPE.

So the normal SAS syntax would be something like:

proc summary data=group_sort nway;
  by id type;
  var credit_before_semester  credit_after_semester semester_credit;
  output out=group_1
         n=type_count
         sum=sum_credit_before_semester sum_credit_after_semester sum_semester_credit
  ;
run;

Or you could use PROC SQL.

proc sql;
create table group_1 as
select id,type
     , count(*) as type_count
     , sum(credit_before_semester) as sum_credit_before_semester
     , sum(credit_after_semester) as sum_credit_after_semester 
     , sum(semester_credit) as sum_semester_credit
group by id,type
;
quit;
Musenol
Fluorite | Level 6
Thank you ! PROC SQL work correctly.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2658 views
  • 11 likes
  • 6 in conversation