BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi All,

 

I'd like to:

- Select the row associated with the smallest diff_range

- Among the patients who took diagnosis codes 1 or 2 or 6

- Select the row associated with the distinct smallest diff_range 

 

The want is below and for example why ID=1 was selected was that: 

ID=1 took target diagnosis codes 1 and 2. But 1, 1, 2, row is selected because of minimum value for a diff_range.

 

My current codes produce more observations than I need because code and diff_range variables are not one-to-one with the ID and min_diff.  But I really don't want min_diff repeated in the final want data and also not willing to deduplicate data using extra step afterwards.

 

Any solutions? please. I'll greatly appreciate your help. 

 

data p.have;
input id code diff_range;
cards;
1 1 2
1 2 3
1 3 4
1 3 4
1 5 7
2 1 1
2 2 3
2 2 3
2 4 5
2 5 6
3 3 1
3 4 2
3 5 3
4 8 8
4 9 9
5 6 10
5 10 12
6 2 2
6 2 1
6 6 9
6 6 4
;

data want;
input id code diff_range;
cards;
1 1 2
2 1 1
5 6 10
6 2 1
;

proc sql;
create table p.want_min1 as
select distinct * , min(diff_range) as min_diff from p.have
where id in
(select id from p.have where code in (1,2,6))
group by id;
quit;

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

I think the my sql will do that, test and let us know. Also, if you want to make your life easy, of course sort and datastep is by far simplest.

 

The below makes sql my style with fancy  boolean although not much of big deal from the previous

 


data have;
input id code diff_range weight_evidence;
cards;
1 1 2 10 
1 2 2 8 
1 3 4 0
1 3 4 0
1 5 7 0
2 1 1 10
2 2 3 8
2 2 3 8
2 4 5 0
2 5 6 0
3 3 1 0
3 4 2 0
3 5 3 0
4 8 8 0
4 9 9 0
5 6 10 6
5 1 10 10 
5 10 12 0
6 1 2 10 
6 2 2 8
6 1 9 10
6 6 9 6 
;
proc sql;
create table want(drop=_:) as
select *
from 
(select *, max(code in (1,2,6)) as _c,min(diff_range) as _min from have group by id)
group by id,_min
having _c and max(weight_evidence)=weight_evidence and _min=diff_range;
quit;

 

View solution in original post

23 REPLIES 23
r_behata
Barite | Level 11
data have;
input id code diff_range;
cards;
1 1 2
1 2 3
1 3 4
1 3 4
1 5 7
2 1 1
2 2 3
2 2 3
2 4 5
2 5 6
3 3 1
3 4 2
3 5 3
4 8 8
4 9 9
5 6 10
5 10 12
6 2 2 
6 2 1
6 6 9
6 6 4
;
run;

proc sort data=have(where=(code in (1,2,6)));
	by id diff_range;
run;

data want;
	set have;
	by id diff_range;

	if first.id then output;
run;
novinosrin
Tourmaline | Level 20

That's happening because of remerging behaviour of proc sql

 

try this change highlighted

 

proc sql;
create table p.want_min1 as
select distinct * , min(diff_range) as min_diff from p.have
where id in 
(select id from p.have where code in (1,2,6))
group by id

having diff_range=min_diff;
quit;

novinosrin
Tourmaline | Level 20

One question you would have to let know is what if there is a  tie in min_diff?

 

And if you do not want min_diff in your output, the code can be as simple as

 

data have;
input id code diff_range;
cards;
1 1 2
1 2 3
1 3 4
1 3 4
1 5 7
2 1 1
2 2 3
2 2 3
2 4 5
2 5 6
3 3 1
3 4 2
3 5 3
4 8 8
4 9 9
5 6 10
5 10 12
6 2 2 
6 2 1
6 6 9
6 6 4
;

proc sql;
create table want_min2 as
select  *  from have
where id in 
(select id from have where code in (1,2,6))
group by id

having diff_range= min(diff_range);
quit;

proc print noobs;run;

Results:

SAS Output

The SAS System

id code diff_range
1 1 2
2 1 1
5 6 10
6 2 1

 

Cruise
Ammonite | Level 13

@novinosrin

so true, it doesn't work when min_diff ties up, just like below. When min_diff are the same across the different diagnosis codes that single patient took, I have to assign the weight to each diagnosis codes in terms of the strength of the evidence to prove of the disease. And the row associated wit the higher score for diagnosis code would be selected among the ties by min_diff. 

 

data p.have;
input id code diff_range;
cards;
1 1 2
1 2 3
1 3 4
1 3 4
1 5 7
2 1 1
2 2 3
2 2 3
2 4 5
2 5 6
3 3 1
3 4 2
3 5 3
4 8 8
4 9 9
5 6 10
5 10 12
6 1 2
6 2 2
6 1 9
6 6 9
;

proc sql;
create table p.want_min1 as
select distinct * , min(diff_range) as min_diff from p.have
where id in
(select id from p.have where code in (1,2,6))
group by id
having diff_range=min_diff;
quit;

 

novinosrin
Tourmaline | Level 20

Ok,so do you have weighting variable in place?

Cruise
Ammonite | Level 13

@novinosrin

Yes, please see below:

data p.have;
input id code diff_range weight_evidence;
cards;
1 1 2 10
1 2 2 8
1 3 4 0
1 3 4 0
1 5 7 0
2 1 1 10
2 2 3 8
2 2 3 8
2 4 5 0
2 5 6 0
3 3 1 0
3 4 2 0
3 5 3 0
4 8 8 0
4 9 9 0
5 6 10 6
5 1 10 10
5 10 12 0
6 1 2 10
6 2 2 8
6 1 9 10
6 6 9 6
;

novinosrin
Tourmaline | Level 20

data have;
input id code diff_range weight_evidence;
cards;
1 1 2 10 
1 2 2 8 
1 3 4 0
1 3 4 0
1 5 7 0
2 1 1 10
2 2 3 8
2 2 3 8
2 4 5 0
2 5 6 0
3 3 1 0
3 4 2 0
3 5 3 0
4 8 8 0
4 9 9 0
5 6 10 6
5 1 10 10 
5 10 12 0
6 1 2 10 
6 2 2 8
6 1 9 10
6 6 9 6 
;
proc sql;
create table want_min1 as
select *
from 
(select distinct * , min(diff_range)as min_diff from have
where id in 
(select id from have where code in (1,2,6))
group by id)
group by id,min_diff
having weight_evidence=max(weight_evidence);
quit;

PS I will simplify the above if i can

Reeza
Super User
Cruise, can you post the expected output as well, how the weights need to be handled is unclear. Is it just part of the selection or are you reweighting?
Reeza
Super User
Does the minimum code have to be with diagnosis 1, 2, or 6? Or just a patient with one of those diagnosis? That will help novinosrin simplify his code.
mkeintz
PROC Star

Then make a minor modification to the proc sort solution.  Assuming your weight variable is WGT:

 

proc sort data= have (where=(code in (1,2,6))) out=need;
  by Id diff_range descending wgt;
run;

data want;
  set need; 
  by id diff_range;
  if first.id;
run;

 

which will work perfectly unless you have an ID containing multiple records at the lowest diff_range having tied values at the highest wgt.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

Sort and use FIRST. 

 

Proc sort data= have;
By Id diff_range code;
Run;

Data want;
Set have; 
By id diff_range;
Where code in (1, 2, 6);

If first.id;

Run;

This will not keep duplicates. You could also sort by code to ensure you always get the smallest or largest. Though this may not be necessary, it’s easier to have a consistent logic so that you can replicate this if ever needed. 

Cruise
Ammonite | Level 13

hI

 

WEIGHT_EVIDENCE IS CREATED BASED ON THE CODE LIKE FOLLOWING: 

 

if code=1 then weight_evidence=10; else
if code=2 then weight_evidence=8; else
if code=6 then weight_evidence=6;

 

HOWEVER, MIN_DIFF CAN RANGE 1 THRU 30. 

 

In ideal world, where life goes our way, maximum weight of evidence would be corresponded with the minimum difference. However, there are discordant cases, which I thought of following as shown int he image. Please help me think of more discordant scenario if seem not considered here. 

The Rationale for Selection and the Selected Cases should answer @Reeza and @novinosrin to understand how weight should be handled in the final output data. 

 

SELECTION_CRITERIA.jpg

 

Anything apparently incorrect? please let me know. I needed to think of what discordant cases might be. My actual data is huge and I have to resolve the issue on these hypothetical small dataset first. Hope that makes sense. 

Reeza
Super User
Rather than try and think of all circumstances, what happens if you find all the cases that don't have issues, ie where min diff and max_weight and then see how many you have to deal with and consider them case by case?
novinosrin
Tourmaline | Level 20

I think the my sql will do that, test and let us know. Also, if you want to make your life easy, of course sort and datastep is by far simplest.

 

The below makes sql my style with fancy  boolean although not much of big deal from the previous

 


data have;
input id code diff_range weight_evidence;
cards;
1 1 2 10 
1 2 2 8 
1 3 4 0
1 3 4 0
1 5 7 0
2 1 1 10
2 2 3 8
2 2 3 8
2 4 5 0
2 5 6 0
3 3 1 0
3 4 2 0
3 5 3 0
4 8 8 0
4 9 9 0
5 6 10 6
5 1 10 10 
5 10 12 0
6 1 2 10 
6 2 2 8
6 1 9 10
6 6 9 6 
;
proc sql;
create table want(drop=_:) as
select *
from 
(select *, max(code in (1,2,6)) as _c,min(diff_range) as _min from have group by id)
group by id,_min
having _c and max(weight_evidence)=weight_evidence and _min=diff_range;
quit;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 11632 views
  • 15 likes
  • 5 in conversation