BookmarkSubscribeRSS Feed
saspert
Pyrite | Level 9

Hello,

I have an interesting scenario where I have to delete/include the records in a dataset based on certain conditions.

Category Count

Asian          12

American      3

European      25

Unkown      31

All               71

Out of this dataset if any of categories except Unkown and All have a count less than 10, then I want the output dataset to have only Unkown and All. But if all the categories except Unkown and All have count of  more than 10, then all the categories should be in the output dataset. But because I am going against the database, I am not entirely sure how many different categories will show up besides All and Unkown.

I have a coding mental block with reading the dataset more than once in the same datastep. Just wondering if there is an easy way with either Datastep/Proc Sql.

PROC SQL;

CREATE TABLE DEMOGRAPHICS_OP AS

SELECT CASE

  WHEN CATEGORY IN ('ALL','UNKNOWN') THEN CATEGORY

  WHEN CATEGORY NOT IN ('ALL','UNKNOWN') AND COUNT < 10 THEN ' '

  ELSE CATEGORY

  END AS CATEGORY

,COUNT

FROM DEMOGRAPHICS_BASE

;

QUIT;

Thanks,

saspert.

12 REPLIES 12
art297
Opal | Level 21

Would something like the following sufice?:

data conditional want;

  set have end=lastone;

  retain failed;

  if Category in ('Unkown','All') then output want;

  else do;

    if not failed then do;

      if count lt 10 then failed=1;

      else output conditional;

    end;

  end;

  if lastone and not failed then do;

    call execute ('proc append base=want data=conditional;run;');

  end;

run;

buckeye
Obsidian | Level 7

saspert, this would give what you wanted.

data cat;
     input category $ count;
   cards;
Asian          12
American      3
European      25
Unknown      31
All               71
;
run;

proc sql;
  create table tmp as
  select count(*) as totcnt,sum(case when count>10 then 1 else 0 end) as catcnt from cat where category not in ('Unknown','All');

  create table result as
  select * from cat where category in ('Unknown','All') or (select totcnt from tmp) = (select catcnt from tmp);
quit;

proc print data = tmp;
proc print data = result;
run;

art297
Opal | Level 21

FWIW, both suggestions result in the same output (well, at least if the actual spelling of unknown whatever it really is in the data is used), but the datastep approach takes one-half the time, particularly when the conditions aren't met (as in the example).  Otherwise, they take about the same time to run.

Art

buckeye
Obsidian | Level 7

Thanks for the observation, Art. It is consistent with my experience that datastep is about always quicker than proc sql, especially when dealing with larger number of records in a dataset, with multiple tables.

Ksharp
Super User

Unfortunately It is not suitable for Cartesian Product.

SQL is designed based on Cartesian Product, sql would have some optimize.

Ksharp

saspert
Pyrite | Level 9

Hi Art297 and Buckeye,

I tried both your solutions and they both work for me. Personally, I prefer the PROC SQL method because I am more comfortable with that. Also, the database values are summarized before this solution is used. So, I decided to go with the Proc SQL method. I should however learn the nuances of the data step/retain methods.

Thanks,

saspert.

Tom
Super User Tom
Super User

You just need to use a summary function in SQL to determine if all of the variables have counts larger than ten. You can do it with or without another grouping variable in the dataset.  This code takes advantage of the fact that SAS evaluates logical operations as 1 (true) or 0 (false).

data have ;
  length group $5 ;
  input Category $ Count;
  group = 'SMALL' ; output;
  group = 'LARGE' ; count=10+count; output;
cards;
Asian    12
American  3
European 25
Unknown   9
All      71
run;
 
proc sql noprint ;
  create table want as 
    select group
         , category
         , count
      from have
      group by group
      having min( (count > 10) or category = 'Unknown' ) = 1 
          or category in ('All' 'Unknown')
      order by group ,category
  ;
quit;

Obs group Category Count
 1  LARGE All        81
 2  LARGE American   13
 3  LARGE Asian      22
 4  LARGE European   35
 5  LARGE Unknown    19
 6  SMALL All        71
 7  SMALL Unknown     9
Peter_C
Rhodochrosite | Level 12

this is a candidate for a simple 2-pass solution.

First pass makes the decision which is retained to be applied in the second pass.

Data result ;

Set demog( in= firstpass ) demog ;

Retain decision 0;

If firstpass then do ;

   If count&lt;10 then do ;

     If not lowcase( category ) in( 'unknown', 'all' )

        Then decision = 1 ;

    End ;

    Delete ;

  End;

  If lowcase( category ) in ('all','unknown') or decision = 0;

run ;

buckeye
Obsidian | Level 7

It is good to know that aggregate function is not needed when grouping. This is different from SQL statements in languages rather than SAS.

art297
Opal | Level 21

Not sure what you discovered.  Proc SQL does require the use of a summary function in order for a group statement to apply.

buckeye
Obsidian | Level 7

It appears it is not required in SAS proc sql as illustrated in Tom's solution below:

select group

         , category

         , count

      from have

      group by group

Aggregate functions, such as min(), max(), sum(), are not required for, in this case, category and count. Hope this clarifies.

Tom
Super User Tom
Super User

The program does use the MIN() aggregate function in the HAVING clause. 

But it is not required that an aggregate statistic be include in the selected variables.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 4141 views
  • 7 likes
  • 6 in conversation