DATA Step, Macro, Functions and more

Conditional Data Step/Proc Sql

Reply
Super Contributor
Posts: 275

Conditional Data Step/Proc Sql

Hello,

I have an interesting scenario where I have to delete/include the records in a dataset based on certain conditions.

Category Count

Asian          12

American      3

European      25

Unkown      31

All               71

Out of this dataset if any of categories except Unkown and All have a count less than 10, then I want the output dataset to have only Unkown and All. But if all the categories except Unkown and All have count of  more than 10, then all the categories should be in the output dataset. But because I am going against the database, I am not entirely sure how many different categories will show up besides All and Unkown.

I have a coding mental block with reading the dataset more than once in the same datastep. Just wondering if there is an easy way with either Datastep/Proc Sql.

PROC SQL;

CREATE TABLE DEMOGRAPHICS_OP AS

SELECT CASE

  WHEN CATEGORY IN ('ALL','UNKNOWN') THEN CATEGORY

  WHEN CATEGORY NOT IN ('ALL','UNKNOWN') AND COUNT < 10 THEN ' '

  ELSE CATEGORY

  END AS CATEGORY

,COUNT

FROM DEMOGRAPHICS_BASE

;

QUIT;

Thanks,

saspert.

PROC Star
Posts: 7,487

Conditional Data Step/Proc Sql

Would something like the following sufice?:

data conditional want;

  set have end=lastone;

  retain failed;

  if Category in ('Unkown','All') then output want;

  else do;

    if not failed then do;

      if count lt 10 then failed=1;

      else output conditional;

    end;

  end;

  if lastone and not failed then do;

    call execute ('proc append base=want data=conditional;run;');

  end;

run;

Contributor
Posts: 24

Conditional Data Step/Proc Sql

saspert, this would give what you wanted.

data cat;
     input category $ count;
   cards;
Asian          12
American      3
European      25
Unknown      31
All               71
;
run;

proc sql;
  create table tmp as
  select count(*) as totcnt,sum(case when count>10 then 1 else 0 end) as catcnt from cat where category not in ('Unknown','All');

  create table result as
  select * from cat where category in ('Unknown','All') or (select totcnt from tmp) = (select catcnt from tmp);
quit;

proc print data = tmp;
proc print data = result;
run;

PROC Star
Posts: 7,487

Re: Conditional Data Step/Proc Sql

FWIW, both suggestions result in the same output (well, at least if the actual spelling of unknown whatever it really is in the data is used), but the datastep approach takes one-half the time, particularly when the conditions aren't met (as in the example).  Otherwise, they take about the same time to run.

Art

Contributor
Posts: 24

Conditional Data Step/Proc Sql

Thanks for the observation, Art. It is consistent with my experience that datastep is about always quicker than proc sql, especially when dealing with larger number of records in a dataset, with multiple tables.

Super User
Posts: 10,044

Conditional Data Step/Proc Sql

Unfortunately It is not suitable for Cartesian Product.

SQL is designed based on Cartesian Product, sql would have some optimize.

Ksharp

Super Contributor
Posts: 275

Conditional Data Step/Proc Sql

Hi Art297 and Buckeye,

I tried both your solutions and they both work for me. Personally, I prefer the PROC SQL method because I am more comfortable with that. Also, the database values are summarized before this solution is used. So, I decided to go with the Proc SQL method. I should however learn the nuances of the data step/retain methods.

Thanks,

saspert.

Super User
Super User
Posts: 7,074

Re: Conditional Data Step/Proc Sql

You just need to use a summary function in SQL to determine if all of the variables have counts larger than ten. You can do it with or without another grouping variable in the dataset.  This code takes advantage of the fact that SAS evaluates logical operations as 1 (true) or 0 (false).

data have ;
  length group $5 ;
  input Category $ Count;
  group = 'SMALL' ; output;
  group = 'LARGE' ; count=10+count; output;
cards;
Asian    12
American  3
European 25
Unknown   9
All      71
run;
 
proc sql noprint ;
  create table want as 
    select group
         , category
         , count
      from have
      group by group
      having min( (count > 10) or category = 'Unknown' ) = 1 
          or category in ('All' 'Unknown')
      order by group ,category
  ;
quit;

Obs group Category Count
 1  LARGE All        81
 2  LARGE American   13
 3  LARGE Asian      22
 4  LARGE European   35
 5  LARGE Unknown    19
 6  SMALL All        71
 7  SMALL Unknown     9
Valued Guide
Posts: 2,177

Re: Conditional Data Step/Proc Sql

this is a candidate for a simple 2-pass solution.

First pass makes the decision which is retained to be applied in the second pass.

Data result ;

Set demog( in= firstpass ) demog ;

Retain decision 0;

If firstpass then do ;

   If count&lt;10 then do ;

     If not lowcase( category ) in( 'unknown', 'all' )

        Then decision = 1 ;

    End ;

    Delete ;

  End;

  If lowcase( category ) in ('all','unknown') or decision = 0;

run ;

Contributor
Posts: 24

Conditional Data Step/Proc Sql

It is good to know that aggregate function is not needed when grouping. This is different from SQL statements in languages rather than SAS.

PROC Star
Posts: 7,487

Conditional Data Step/Proc Sql

Not sure what you discovered.  Proc SQL does require the use of a summary function in order for a group statement to apply.

Contributor
Posts: 24

Conditional Data Step/Proc Sql

It appears it is not required in SAS proc sql as illustrated in Tom's solution below:

select group

         , category

         , count

      from have

      group by group

Aggregate functions, such as min(), max(), sum(), are not required for, in this case, category and count. Hope this clarifies.

Super User
Super User
Posts: 7,074

Re: Conditional Data Step/Proc Sql

The program does use the MIN() aggregate function in the HAVING clause. 

But it is not required that an aggregate statistic be include in the selected variables.

Ask a Question
Discussion stats
  • 12 replies
  • 1947 views
  • 7 likes
  • 6 in conversation