An Idea Exchange for SAS software and services

Comments
by Super User
on ‎07-29-2016 04:40 AM

Not sure where the added value is here.  For the first two you already have that call - min(lengthn()).  For the second two you have select into statement, or if using SQL you can use subqueries.  So I am not seeing the point in rolling up a list of values?  Example:

proc sql;
  create table WANT as
  select  min(lengthn(MODEL)) as MIN, 
          max(lengthn(MODEL)) as MAX
  from    SASHELP.CARS;
quit;

data tmp;
  set sashelp.cars;
  where make in ("Audi");
run;

proc sql;  
  create table WANT2 as
  select  *
  from    SASHELP.CARS
  where   MODEL in (select MODEL from TMP);  /* Put distinct in if needed */
quit;
by PROC Star
‎07-29-2016 04:58 AM - edited ‎08-04-2016 06:17 PM

There are always ways to do things. Your code does not create a variable containing the longest value or the list of all values, though one could of course write such code. Min() max() etc already allow to derive some string summary information. More summarisation functions are useful when manipulating strings.
If you don't manipulate strings you don't need these functions.
By your logic that there is another way, the cat() functions would never have been born as !! does it all.

 

Edit:

Actually min() and max() can't handle strings. I don't know where I got this idea from.

So is the only existing summary function nmiss()?

by Respected Advisor
‎07-29-2016 02:19 PM - edited ‎08-01-2016 02:57 PM

I don't have a need for shortest() or longest() but certainly do for list() and for listnodup(). I often need to list the subcategories of a group as a list, especially when their number is small. 

 

Store   Category    Product                  Sales
A       Fruits      Apple, Banana, Pear      10000
A       Vegetables  Carrots, Celery, Parsnip 5000

You can't do that with macros.

by PROC Star
on ‎07-30-2016 05:41 AM
The list function could use parameters to enable options like dedupe, quote, ignore missing values, for example.
by PROC Star
on ‎07-30-2016 08:06 AM

Curious if other SQL implementations provide such types of string summary/aggregation functions? 

 

I can see the benefit.  But the idea of taking a column  with one value per record and collapsing it to have each record storing a delimited list of values feels very un-SQL to me.

 

Well, I googled a bit.  Looks like SQL server doesn't have a nice solution, but MySQL has Group_Concat() which looks like what you want.

 

http://stackoverflow.com/questions/273238/how-to-use-group-by-to-concatenate-strings-in-sql-server

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat

 

by PROC Star
on ‎08-01-2016 01:17 AM

listagg() in oracle.

 

by Respected Advisor
on ‎08-01-2016 02:56 PM

Since SAS already has those functions, I think the new summary functions should be called cats, catq, catt, and catx. Their behaviour would mimic the behaviour of other multi-parameter functions such as min, nmiss, and mean, which are both in SAS and SAS/SQL.

I realize that this change might conflict with existing code which uses cats(x) or catt(x) instead of Compress and Trim, respectively, but I believe such conflicts would be rare. The benefits of having a consistent naming scheme would be greater, in the long run.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Idea Statuses
Top Liked Authors