Hi,
I am using PROC SQL to create a table with the distinct variable combinations I am interested in:
PROC SQL;
CREATE TABLE want AS
SELECT DISTINCT var1, var2, var3
FROM have;
QUIT;
However, I would like to add a column to that created table with a running number (and not use a subsequent data step to do so). Is there an elegant way to do that?
Thanks
Try the undocumented/unsupported function monotonic() (I havn't tried with distinct - so no guarantees).
Why is a subsequent data step so bad - do you got tons of data?
I just appreciate short code.
But yeah, I read about that function while googling. However, I could not get it to work in terms of creating a table. Could you give an example code?
Just
monotonic() as running_no
Hmm, the moment I run this:
PROC SQL;
CREATE TABLE want AS
SELECT DISTINCT var1, var2, var3, monotonic() as running_no
FROM have;
QUIT;
the DISTINCT counts every single running number as a distinct combination of those variables and just gives back the entire dataset "have" with an added running-number-column.
No guarantees, remember 🙂
Ok, then you need to nest your original query as an in-line view to the outer query, in which you assign your running_no.
I remember
Uhm, sorry but I have never even heard inner and outer query. Do you mind suggesting it as an example once more?
select a.*, monotonic() as running_no
from (select distinct.... from have) as a
As @LinusH has mentioned, monotonic() couts each observation as its processed. However SQL is not an ordered processing like SAS, it can be affected by many different things internally, such as sorts, distinct, joins etc. I wouldn't recommend using it in this instance, as the numerb assigned to a distinct value is unlikely to be the sequence you intend, unless you happen to know offhand what internal procesuders, and what threads each is assigned to. Why do you need to add this? Am thinking if you do need this, then why not combine it into another step as datastep already has _n_ for position of observation in the dataset, however I always find its better to fix it yourself by hand, e.g.
proc sort data=have nodupkey;
by value;
run;
data have;
set have;
code=_n_;
run;
Then use that coded list to merge an identifier onto the data, otherwise if your list changes programmatically as you do now, then your output may change accordingly and further code might not even run. If you can give a proper example of what you are trying to do...
Do not use MONOTONIC() , Since it is not documented , that would generate unexpected result .
ods _all_ close;
ods output sql_results=want;
PROC SQL number ;
SELECT DISTINCT name,sex,age
FROM sashelp.class;
QUIT;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.