BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi folks:

 

I'm trying to select the row with the min(diff) and if rows ties on diff like it occurs for the ID=3 patient then select the first row.

How to achieve it? my code below doesn't help when rows tied on the 'diff'.

 

Thanks in advance

 

data temp;
input id diff; 
cards;
1 2
1 3
2 1
2 2
3 2
3 2
;
proc sql;
select *
from temp
group by CTC_id
having DIFF eq min(DIFF)
;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

And the Datastep:

data want;
 do _n_=1 by 1 until(last.id);
  set temp;
  by id;
  if min>. and diff>=min then continue;
  min=diff;
  k=_n_;
 end;
 do _n_=1 to _n_;
  set temp;
  if _n_=k then output;
 end;
 drop k min;
run;

View solution in original post

9 REPLIES 9
Reeza
Super User

Use a data step and FIRST with a BY statement instead?

 

Or two proc sorts.

 

proc sort data=temp;
by id diff;
run;

data want1;
set temp;
by id;
if first.id;
run;


proc sort data=temp;
by id diff;
run;

proc sort data=temp out=want2 NODUPKEY;
by ID;
run;
novinosrin
Tourmaline | Level 20

I am gonna jump to ameeting, but wait for a datastep from somebody else as sql is not robust but until then some fun

 

proc sql;
create table want(drop=m) as
select *
from
(select *
from (select *,monotonic() as m from temp)
group by id
having DIFF eq min(DIFF))
group by id,diff
having min(m)=m;
;
quit;
Cruise
Ammonite | Level 13
I'm curious about the context that 'sql' is not robust. Do you mind elaborate on that plz?
novinosrin
Tourmaline | Level 20

Too tired and about to doze off. Will explain tomorrow without fail 

novinosrin
Tourmaline | Level 20

Hi @Cruise  SQL is indeed a great ready-meals and super convenient construct of clauses that provides fantastic utility for summary statistics aside the provision of pushing a query easily to a third party database and many others. In essence this kind of compatibility is pretty much unparalleled considering a SQL programmer minus SAS programmer would just like that use it to its full extent and be over with tasks. Therefore, Yes, your thought process is correct to question why would SQL not be robust should you consider all the advantages mentioned and I suppose I missed out many advantages too.

 

However, your case is kinda specific that mandates to address certain constraints.

1. Since your sample(assuming a representative sample of your real) seems a clean sorted dataset by ID and in an order presumably the observations within the ID are ordered in sequence as you indicated in your preference in picking the earliest(1st occurrence) in case of a tie.

 

2. 1 makes a case needing to compute a row_number to identify and choose which one is the earliest. This could be achieved by using MONOTONIC() , which is not documented, some argue it is error prone and at any rate is costly to have an extra pass of the dataset that you noticed in the sub-query. 

 

3.  Though seemingly simple and concise, there were 3 passes of the dataset, 1st to compute row_num, 2nd to determine the min(diff) and 3rd to identify and choose min of  row_num from min(diff) 

 

4. Also, it is always better to think through a datastep solution the very moment you are certain that your dataset is sorted and you know your data alignment. The GROUP BY clause performs a lot of actions i.e does an internal sort, then follows if there is a summary statistics involved and so forth. One caveat to keep in mind though is, in some operating environments, the SQL optimizer does some magic to beat Datastep solutions with a short cut plan in building an internal algorithm. This topic is complex and not needed to confuse us at this point

 

5. The Datastep offers better control for the user as it is actually easier and intuitive than SQL. One would argue against me here, but trust me it actually takes longer to get the hang of internal mechanisms of SQL as opposed to clearly defined sequential iteration of datastep execution i.e one by one. In SQL, you can go from row processing to column processing and back to row processing and beyond all in one SELECT CLAUSE. I deem this crazy but I believe I have gotten very thorough with this 🙂

 

6. So, instead of your data requiring a preliminary row_number partitioned by group, if it had a DATE variable and it it were to be an unsorted dataset and if your operating environment is highly conducive for the SQL optimizer to pick the shortest internal algorithm, my oh my! you might just win.

 

Hope this helps!

novinosrin
Tourmaline | Level 20

And the Datastep:

data want;
 do _n_=1 by 1 until(last.id);
  set temp;
  by id;
  if min>. and diff>=min then continue;
  min=diff;
  k=_n_;
 end;
 do _n_=1 to _n_;
  set temp;
  if _n_=k then output;
 end;
 drop k min;
run;
yabwon
Onyx | Level 15

@novinosrin , perfect use case of double-DoW-loop.

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



novinosrin
Tourmaline | Level 20

@yabwon  Dziękuję bardzo and the same to you in my native Tamil  மிக்க நன்றி. BTW, if you ever meet somebody that "somebody" by the name Mr. Marcin from Krakow(I believe) asked me to learn something called SAS in  catholic church, Wolverhampton over a glass of red wine, say my heartfelt thanks. I can't remember him nor I know him but he recommended this pubaholic forklift driver with no education whatsoever to learn SAS believing that I would do well in that one meeting instance. Haha :), seems partially true. God Bless!

 

 

yabwon
Onyx | Level 15

@novinosrin,

 

Wow! How cool it is that your "SAS roots" are Polish! 😉 If we ever have chance to meet face-2-face I have to hear that story from you in more details (maybe over a glass of wine too, maybe at SGF2020?)

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 5092 views
  • 7 likes
  • 4 in conversation