Solved: Select row with min value otherwise select the first row when rows tie...

Cruise · Posted 01-16-2020 12:53 PM

Hi folks:

I'm trying to select the row with the min(diff) and if rows ties on diff like it occurs for the ID=3 patient then select the first row.

How to achieve it? my code below doesn't help when rows tied on the 'diff'.

Thanks in advance

data temp;
input id diff; 
cards;
1 2
1 3
2 1
2 2
3 2
3 2
;
proc sql;
select *
from temp
group by CTC_id
having DIFF eq min(DIFF)
;
quit;

novinosrin · Posted 01-16-2020 01:27 PM

And the Datastep:

data want;
 do _n_=1 by 1 until(last.id);
  set temp;
  by id;
  if min>. and diff>=min then continue;
  min=diff;
  k=_n_;
 end;
 do _n_=1 to _n_;
  set temp;
  if _n_=k then output;
 end;
 drop k min;
run;

View solution in original post

Reeza · Posted 01-16-2020 12:57 PM

Use a data step and FIRST with a BY statement instead?

Or two proc sorts.

proc sort data=temp;
by id diff;
run;

data want1;
set temp;
by id;
if first.id;
run;


proc sort data=temp;
by id diff;
run;

proc sort data=temp out=want2 NODUPKEY;
by ID;
run;

novinosrin · Posted 01-16-2020 01:00 PM

I am gonna jump to ameeting, but wait for a datastep from somebody else as sql is not robust but until then some fun

proc sql;
create table want(drop=m) as
select *
from
(select *
from (select *,monotonic() as m from temp)
group by id
having DIFF eq min(DIFF))
group by id,diff
having min(m)=m;
;
quit;

Cruise · Posted 01-16-2020 08:05 PM

I'm curious about the context that 'sql' is not robust. Do you mind elaborate on that plz?

novinosrin · Posted 01-16-2020 08:56 PM

Too tired and about to doze off. Will explain tomorrow without fail

novinosrin · Posted 01-17-2020 10:56 AM

Hi @Cruise SQL is indeed a great ready-meals and super convenient construct of clauses that provides fantastic utility for summary statistics aside the provision of pushing a query easily to a third party database and many others. In essence this kind of compatibility is pretty much unparalleled considering a SQL programmer minus SAS programmer would just like that use it to its full extent and be over with tasks. Therefore, Yes, your thought process is correct to question why would SQL not be robust should you consider all the advantages mentioned and I suppose I missed out many advantages too.

However, your case is kinda specific that mandates to address certain constraints.

1. Since your sample(assuming a representative sample of your real) seems a clean sorted dataset by ID and in an order presumably the observations within the ID are ordered in sequence as you indicated in your preference in picking the earliest(1st occurrence) in case of a tie.

2. 1 makes a case needing to compute a row_number to identify and choose which one is the earliest. This could be achieved by using MONOTONIC() , which is not documented, some argue it is error prone and at any rate is costly to have an extra pass of the dataset that you noticed in the sub-query.

3. Though seemingly simple and concise, there were 3 passes of the dataset, 1st to compute row_num, 2nd to determine the min(diff) and 3rd to identify and choose min of row_num from min(diff)

4. Also, it is always better to think through a datastep solution the very moment you are certain that your dataset is sorted and you know your data alignment. The GROUP BY clause performs a lot of actions i.e does an internal sort, then follows if there is a summary statistics involved and so forth. One caveat to keep in mind though is, in some operating environments, the SQL optimizer does some magic to beat Datastep solutions with a short cut plan in building an internal algorithm. This topic is complex and not needed to confuse us at this point

5. The Datastep offers better control for the user as it is actually easier and intuitive than SQL. One would argue against me here, but trust me it actually takes longer to get the hang of internal mechanisms of SQL as opposed to clearly defined sequential iteration of datastep execution i.e one by one. In SQL, you can go from row processing to column processing and back to row processing and beyond all in one SELECT CLAUSE. I deem this crazy but I believe I have gotten very thorough with this 🙂

6. So, instead of your data requiring a preliminary row_number partitioned by group, if it had a DATE variable and it it were to be an unsorted dataset and if your operating environment is highly conducive for the SQL optimizer to pick the shortest internal algorithm, my oh my! you might just win.

Hope this helps!

novinosrin · Posted 01-16-2020 01:27 PM

And the Datastep:

data want;
 do _n_=1 by 1 until(last.id);
  set temp;
  by id;
  if min>. and diff>=min then continue;
  min=diff;
  k=_n_;
 end;
 do _n_=1 to _n_;
  set temp;
  if _n_=k then output;
 end;
 drop k min;
run;

yabwon · Posted 01-16-2020 03:02 PM

@novinosrin , perfect use case of double-DoW-loop.

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

novinosrin · Posted 01-16-2020 03:14 PM

@yabwon Dziękuję bardzo and the same to you in my native Tamil மிக்க நன்றி. BTW, if you ever meet somebody that "somebody" by the name Mr. Marcin from Krakow(I believe) asked me to learn something called SAS in catholic church, Wolverhampton over a glass of red wine, say my heartfelt thanks. I can't remember him nor I know him but he recommended this pubaholic forklift driver with no education whatsoever to learn SAS believing that I would do well in that one meeting instance. Haha :), seems partially true. God Bless!

yabwon · Posted 01-17-2020 03:31 AM

@novinosrin,

Wow! How cool it is that your "SAS roots" are Polish! 😉 If we ever have chance to meet face-2-face I have to hear that story from you in more details (maybe over a glass of wine too, maybe at SGF2020?)

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Select row with min value otherwise select the first row when rows tied

Re: Select row with min value otherwise select the first row when rows tied

Re: Select row with min value otherwise select the first row when rows tied

Re: Select row with min value otherwise select the first row when rows tied

Re: Select row with min value otherwise select the first row when rows tied

Re: Select row with min value otherwise select the first row when rows tied

Re: Select row with min value otherwise select the first row when rows tied

Re: Select row with min value otherwise select the first row when rows tied

Re: Select row with min value otherwise select the first row when rows tied

Re: Select row with min value otherwise select the first row when rows tied

Re: Select row with min value otherwise select the first row when rows tied

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!