BookmarkSubscribeRSS Feed

This would give us, the longest, shortest, and list of values by group.

 

 


proc sql;
  select longest(STR)
       , shortest(STR)
       , list(',', quote(trim(STR))) 
       , listnodup(',', quote(trim(STR))) 
  from TABLE
  group by ID;
quit;
7 Comments
RW9
Diamond | Level 26
Diamond | Level 26

Not sure where the added value is here.  For the first two you already have that call - min(lengthn()).  For the second two you have select into statement, or if using SQL you can use subqueries.  So I am not seeing the point in rolling up a list of values?  Example:

proc sql;
  create table WANT as
  select  min(lengthn(MODEL)) as MIN, 
          max(lengthn(MODEL)) as MAX
  from    SASHELP.CARS;
quit;

data tmp;
  set sashelp.cars;
  where make in ("Audi");
run;

proc sql;  
  create table WANT2 as
  select  *
  from    SASHELP.CARS
  where   MODEL in (select MODEL from TMP);  /* Put distinct in if needed */
quit;
ChrisNZ
Tourmaline | Level 20

There are always ways to do things. Your code does not create a variable containing the longest value or the list of all values, though one could of course write such code. Min() max() etc already allow to derive some string summary information. More summarisation functions are useful when manipulating strings.
If you don't manipulate strings you don't need these functions.
By your logic that there is another way, the cat() functions would never have been born as !! does it all.

 

Edit:

Actually min() and max() can't handle strings. I don't know where I got this idea from.

So is the only existing summary function nmiss()?

PGStats
Opal | Level 21

I don't have a need for shortest() or longest() but certainly do for list() and for listnodup(). I often need to list the subcategories of a group as a list, especially when their number is small. 

 

Store   Category    Product                  Sales
A       Fruits      Apple, Banana, Pear      10000
A       Vegetables  Carrots, Celery, Parsnip 5000

You can't do that with macros.

ChrisNZ
Tourmaline | Level 20
The list function could use parameters to enable options like dedupe, quote, ignore missing values, for example.
Quentin
Super User

Curious if other SQL implementations provide such types of string summary/aggregation functions? 

 

I can see the benefit.  But the idea of taking a column  with one value per record and collapsing it to have each record storing a delimited list of values feels very un-SQL to me.

 

Well, I googled a bit.  Looks like SQL server doesn't have a nice solution, but MySQL has Group_Concat() which looks like what you want.

 

http://stackoverflow.com/questions/273238/how-to-use-group-by-to-concatenate-strings-in-sql-server

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat

 

ChrisNZ
Tourmaline | Level 20

listagg() in oracle.

 

PGStats
Opal | Level 21

Since SAS already has those functions, I think the new summary functions should be called cats, catq, catt, and catx. Their behaviour would mimic the behaviour of other multi-parameter functions such as min, nmiss, and mean, which are both in SAS and SAS/SQL.

I realize that this change might conflict with existing code which uses cats(x) or catt(x) instead of Compress and Trim, respectively, but I believe such conflicts would be rare. The benefits of having a consistent naming scheme would be greater, in the long run.