Hello all,
Is there is an easy way within PROC SQL to count the rows? Normally I can just use _N_ in the data step process but would like to handle the query in one step instead of two.
Thank you.
Lawrence
One of my two suggested possibilities was the undocumented monotonic() function. Doesn't the following do what you want? Although, like I mentioned, it isn't documented and doesn't come with any guarantees.
data have;
informat group $50.;
input group & pt_No;
cards;
Anesthesia 234
Anesthesia 287
Emergency Medicine 111
Emergency Medicine 546
Family Medicine 458
;
proc sql;
create table want as
select group,monotonic() as row
from (
select distinct group
from have)
order by group
;
quit;
They are counted automatically and put into a macro variable named &sqlobs
While sql supposedly doesn't process data sequentially, I have yet to see a case where it doesn't. As such, the undocumented function monotonic() appear to result in the same values as _n_ in a datastep.
Art,
That is a good tip in the future but here is the problem that was incidentally deleted somewhow-
SO I have a bunch of entries:
group pt#
Anesthesia 234
Anesthesia 287
Emergency Medicine 111
Emergency Medicine 546
Family Medicine 458
That account for 1000 entries in total
What I want to do is create a new list:
group row
Anesthesia 1
Emergency Medicine 2
Family Medicine 3
Thanks again.
Lawrence
One of my two suggested possibilities was the undocumented monotonic() function. Doesn't the following do what you want? Although, like I mentioned, it isn't documented and doesn't come with any guarantees.
data have;
informat group $50.;
input group & pt_No;
cards;
Anesthesia 234
Anesthesia 287
Emergency Medicine 111
Emergency Medicine 546
Family Medicine 458
;
proc sql;
create table want as
select group,monotonic() as row
from (
select distinct group
from have)
order by group
;
quit;
Art,
Many Thanks again!
Lawrence
Hi ... if the goal is to keep the groups in their original order, this works since the observations are already in group order in the data set.
If the data set had another observation, the results are no longer in the order of GROUP as encountered in the data set ...
data have;
input group & : $50. pt_No;
cards;
Anesthesia 234
Anesthesia 287
Emergency Medicine 111
Emergency Medicine 546
Family Medicine 458
Ambulance Chasing 999
;
run;
the code produces ...
group row
Ambulance Chasing 1
Anesthesia 2
Emergency Medicine 3
Family Medicine 4
Curious ... is the above what is desired or are the groups supposed to maintain data set order?
In this case it does not matter. If it did, then I generally have the variables ordered in a table somewhere else and can order it that way.
NOTE: monotonic() is not safe for SQL since it is not documented in. It will generate error result at sometime ,especially for complex SQL.
But I found another way to get the number of obs. ODS is better.
data have; informat group $50.; input group & pt_No; cards; Anesthesia 234 Anesthesia 287 Emergency Medicine 111 Emergency Medicine 546 Family Medicine 458 ; run; ods output SQL_Results=sql; proc sql number; select distinct group from have; quit;
Ksharp
It is url about monotonic() is not supported by SQL.
Is there a supported replacement to this useful function that I can use in proc sql? I'd hate to insert a data setp into my processing just to use _N_.
Relational database tables are UNORDERED sets of records. They are not guaranteed to maintain any record order unless they are explicitly ordered (indexed) by a field or a combination of fields. If your data record order is important to you, then it should be supported by one or many of your fields and a feature such as MONOTONIC() should not be needed.
PG
>>> it should be supported by one... of your fields.
I have used monotonic() to create the field.
Would you say that if I only use monotonic() when I also use an ORDER BY clause that I'm safe and within the bounds of relational theory?
As long as you touch monotonic(), there could be a potential risk. In my institution, monotonic() CAN NOT be used in production code, which hints at the same time, users CAN explore it in non-production settings.
Haikuo
Guess I was trying to say here is that you can play it as much as you want, come to whatever conclusion from your own research, but there is NO final say on its use, because it is not documented (aka: officially supported) by SAS.
Relational database fields are indeed unordered. But, SAS data sets are ordered, and it is often easier to use a proc SQL for something than a proc sort-data step-proc means/whatever. Also, if you are really using a relational database, you can use passthough SQL to access whatever custom row ordering function that database offers.
With large complex datasets, there are often transactions that don't fit the "documented" rules. It is useful to put row numbers into a table so that you can easier find the duplicates values (monotonic unfortunately doesn't guarantee missing values). If you are already sub-selecting the primary table, then you can cut out an additional data step to just number them.
I think you have tangled some concepts here. As long as you use SQL language (Proc SQL included), then how your data stored (ordered or unordered, RDBMS or SAS table) does not matter anymore. when SQL reads in or spits out data, there is no guaranteed order unless you EXPLICITLY "order by" it. While as a sequential language, SAS data step will loyally read in the data in the order it stores unless you EXPLICITLY change it otherwise.
Haikuo
Update: In term of monotonic(), my suspicion is that it would be very difficult to robustly implement this sequential feature to SQL which is inheritably non-sequential by its nature. Even it can be done, SAS may feel not worth the overhead or other tradeoffs.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.