BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hari1109
Fluorite | Level 6

Hi all,

Can anyone let me know a better approach where i can be able to include all the if else and if statements and where i can do some summations on some newly generated variables in proc sql. Here is what i am doing (please look at the code i have attached)

I am pulling some records from tables using proc sql and in the data step i am doing the summations in the if else and if statements and at the same time i am using a by variable(CLE_DT). My issue is, to pull the records itself it is taking a long time and i would like to improve the performance by including all the if else and if statements and do some summations and then group by in the proc sql itself. I have gone through some papers where we can use some case statements and nested case statements but in this case i am caluculating two new variables(TBL_01, TBL_04) based on conditons and at the same i am using a by variable.So,kindly suggest a better approach.

Thank you all.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You cannot do it that way in SQL.  What you need to do is in the CASE to generate the value that you want for each observation and wrap that in an summation function.

Here is a simple example.

proc sql ;

   select age

        , sum(case when sex='M' then 1 else 0 end) as num_males

        , sum(case when sex='F' then 1 else 0 end) as num_females

    from sashelp.class

    group by age

   ;

quit;

View solution in original post

10 REPLIES 10
art297
Opal | Level 21

You didn't provide all of your code, or any example dataset, thus it is difficult to correctly offer much in the line of suggestions.

However, that said, the first thing that stands out is that you are only adding calculations of those records which have code equal to GE.  Do you still need the others?

If not, eliminating them early might speed things up significantly.

hari1109
Fluorite | Level 6

Thank you,

Sorry I haven't mentioned enough here, ok suppose if i have records between some dates let it be 01JAN004 AND 31JUN2004 from the table GE. since I am using a by variable CLE_DT( which is a date in a given month, in this case i will be have 6 such dates 31JAN2004,29FEB2004,31MAR2004,...,30JUN2004) and I will be having only 6 records with variables CLE_DT,TBL_01,TBL_04 AND CODE( which is always GE in this case). I would like to get only 6 records out of proc sql instead of many number of records. For that i have to find an equivalent of if/else and if statements in proc sql and since i can't use a by variable in proc sql i can use a group by on CLE_DT and at the same time i should be able to calculate new variables(TBL_01,TBL_04) in proc sql as it is in the datastep. I can provide the code it looks the same to that of what i have provided earlier. Thank you.

Tom
Super User Tom
Super User

You can use CASE in general to handle many IF/THEN scenarios in PROC SQL.

For example you had:

IF CODE = 'GE' AND PEN_01 NOT IN ('34',24','12'))) THEN DO;

      TBL_01 =  TBL_01 + ICT_01;

      TBL_04 =  TBL_04 + ICT_01;

END;

Which you might do in SQL with

CASE WHEN ( CODE = 'GE' AND PEN_01 NOT IN ('34',24','12')) THEN TBL_01 + ICT_01

          ....

END as TBL_01

CASE WHEN ( CODE = 'GE' AND PEN_01 NOT IN ('34',24','12')) THEN TBL_04 + ICT_01

          ....

END as TBL_04

To handle the concept of FIRST./LAST. processing you probably will need to use aggregation functions like SUM() and the GROUP BY feature of SQL.

hari1109
Fluorite | Level 6

Hi TOM,

I tried with the above code and i am getting this error :

ERROR: Expression using addition + requires numeric types.

CASE WHEN((C.PLN_01 = '90') OR (C.PEN_01 IN ('61','62','65','66','68','69','A7','B6')

AND C.PEN_01 NOT IN ('27','33','34','35','42','43'))) THEN CALCULATED TBL_01 + D.ICT_01

ELSE CALCULATED TBL_01 + D.ICT_01

END AS TBL_01

Here TBL_01 is the new variable i am calculating. Thank you.

art297
Opal | Level 21

Are d.ict_010 and/or d.ict_01 character variables?

hari1109
Fluorite | Level 6

Hi art297

No, They are numeric values.Thank you

Reeza
Super User

Your else statement is the same as your statement in the if case?

You probably need to post the full code to find the error, ie the log/error message is a good start.

art297
Opal | Level 21

Then we would have to see the code you actually ran.  I would think that, rather than TBL_04 + ICT_01, you would probably want to use something like sum(ICT_01)


Tom
Super User Tom
Super User

You cannot do it that way in SQL.  What you need to do is in the CASE to generate the value that you want for each observation and wrap that in an summation function.

Here is a simple example.

proc sql ;

   select age

        , sum(case when sex='M' then 1 else 0 end) as num_males

        , sum(case when sex='F' then 1 else 0 end) as num_females

    from sashelp.class

    group by age

   ;

quit;

hari1109
Fluorite | Level 6

Thank you for your help, Tom and art297 , i could have wasted a lot of time without knowing what i need do.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 116338 views
  • 8 likes
  • 4 in conversation