This would give us, the longest, shortest, and list of values by group.
, list(',', quote(trim(STR)))
, listnodup(',', quote(trim(STR)))
group by ID;
Not sure where the added value is here. For the first two you already have that call - min(lengthn()). For the second two you have select into statement, or if using SQL you can use subqueries. So I am not seeing the point in rolling up a list of values? Example:
create table WANT as
select min(lengthn(MODEL)) as MIN,
max(lengthn(MODEL)) as MAX
where make in ("Audi");
create table WANT2 as
where MODEL in (select MODEL from TMP); /* Put distinct in if needed */
There are always ways to do things. Your code does not create a variable containing the longest value or the list of all values, though one could of course write such code. Min() max() etc already allow to derive some string summary information. More summarisation functions are useful when manipulating strings. If you don't manipulate strings you don't need these functions. By your logic that there is another way, the cat() functions would never have been born as !! does it all.
Actually min() and max() can't handle strings. I don't know where I got this idea from.
So is the only existing summary function nmiss()?
I don't have a need for shortest() or longest() but certainly do for list() and for listnodup(). I often need to list the subcategories of a group as a list, especially when their number is small.
Store Category Product Sales
A Fruits Apple, Banana, Pear 10000
A Vegetables Carrots, Celery, Parsnip 5000
You can't do that with macros.
Curious if other SQL implementations provide such types of string summary/aggregation functions?
I can see the benefit. But the idea of taking a column with one value per record and collapsing it to have each record storing a delimited list of values feels very un-SQL to me.
Well, I googled a bit. Looks like SQL server doesn't have a nice solution, but MySQL has Group_Concat() which looks like what you want.
listagg() in oracle.
Since SAS already has those functions, I think the new summary functions should be called cats, catq, catt, and catx. Their behaviour would mimic the behaviour of other multi-parameter functions such as min, nmiss, and mean, which are both in SAS and SAS/SQL.
I realize that this change might conflict with existing code which uses cats(x) or catt(x) instead of Compress and Trim, respectively, but I believe such conflicts would be rare. The benefits of having a consistent naming scheme would be greater, in the long run.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.