03-01-2016 08:37 AM
I am using PROC SQL to create a table with the distinct variable combinations I am interested in:
PROC SQL; CREATE TABLE want AS SELECT DISTINCT var1, var2, var3 FROM have; QUIT;
However, I would like to add a column to that created table with a running number (and not use a subsequent data step to do so). Is there an elegant way to do that?
03-01-2016 08:42 AM
Try the undocumented/unsupported function monotonic() (I havn't tried with distinct - so no guarantees).
Why is a subsequent data step so bad - do you got tons of data?
03-01-2016 08:44 AM
I just appreciate short code.
But yeah, I read about that function while googling. However, I could not get it to work in terms of creating a table. Could you give an example code?
03-01-2016 08:54 AM
Hmm, the moment I run this:
PROC SQL; CREATE TABLE want AS SELECT DISTINCT var1, var2, var3, monotonic() as running_no FROM have; QUIT;
the DISTINCT counts every single running number as a distinct combination of those variables and just gives back the entire dataset "have" with an added running-number-column.
03-01-2016 09:02 AM
No guarantees, remember :-)
Ok, then you need to nest your original query as an in-line view to the outer query, in which you assign your running_no.
03-01-2016 09:06 AM
Uhm, sorry but I have never even heard inner and outer query. Do you mind suggesting it as an example once more?
03-01-2016 09:26 AM
As @LinusH has mentioned, monotonic() couts each observation as its processed. However SQL is not an ordered processing like SAS, it can be affected by many different things internally, such as sorts, distinct, joins etc. I wouldn't recommend using it in this instance, as the numerb assigned to a distinct value is unlikely to be the sequence you intend, unless you happen to know offhand what internal procesuders, and what threads each is assigned to. Why do you need to add this? Am thinking if you do need this, then why not combine it into another step as datastep already has _n_ for position of observation in the dataset, however I always find its better to fix it yourself by hand, e.g.
proc sort data=have nodupkey;
Then use that coded list to merge an identifier onto the data, otherwise if your list changes programmatically as you do now, then your output may change accordingly and further code might not even run. If you can give a proper example of what you are trying to do...
03-01-2016 09:49 PM
Do not use MONOTONIC() , Since it is not documented , that would generate unexpected result .
ods _all_ close; ods output sql_results=want; PROC SQL number ; SELECT DISTINCT name,sex,age FROM sashelp.class; QUIT;