BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xiangpang
Quartz | Level 8

Hello,

 

I want to select the IDs with same visit and bmi difference is less than 10%. 

 

not proc sql. Anyone knows how to do that?

 

Thanks a lot 

data new;
input id visit bmi ;
cards;
1 day1 45
1 day12 66
1 day60 53
2 day1 45
2 day6 14
2 day12 36
2 day60 83
3 day1 745
3 day12 66
3 day60 50
;
run;

want;
1 day1 45
2 day1 45
1 day12 66
3 day12 66
1 day60 53
3 day60 50
1 ACCEPTED SOLUTION

Accepted Solutions
s_manoj
Quartz | Level 8

Hello,

Check this code, this could help your need.

 

proc sort;
by visit;
run;

data want (keep = id visit bmi);
set new (rename = (id = id1 visit = visit1 bmi = bmi1));
do i =1 to n;
	set new point = i nobs = n;
 	v = divide(abs(bmi1 - bmi), bmi1);
	if (visit = visit1) & (id ne id1) & v le 0.1 then output  ;
end;
run;

Regards

Manoj

View solution in original post

9 REPLIES 9
xiangpang
Quartz | Level 8

visit should add $  , sorry for that

PGStats
Opal | Level 21

So what if your data contains

 

1 day1 45

2 day1 49

3 day1 53

 

what obs should be selected?

PG
hashman
Ammonite | Level 13

@xiangpang:

 

Actually, it looks more like a job for SQL, and in all candor, the tool should fit the job, not the other way around. 

But if you don't want SQL, the job can be done with other means - it's SAS, after all.

 

However, as your specs are rather vague, I have to make assumptions, to wit:

 

- If there's only one record in a visit, you don't want the visit in the output; your sample output without DAY6 seems to suggest that, too.

- If there're no BMIs differing by less than 10% in the same visit (not in your sample input data), you don't want that visit in the output, either.       

- No visit has more than 9999 BMI measurements. Methinks it's a reasonable assumption ;).

 

With these assumptions:

data new ;                                                     
  input id visit $ bmi ;                                       
  cards ;                                                      
1 day1   45                                                    
1 day12  66                                                    
1 day60  53                                                    
2 day1   45                                                    
2 day6   14                                                    
2 day12  36                                                    
2 day60  83                                                    
3 day1  745                                                    
3 day12  66                                                    
3 day60  50                                                    
run ;                                                          
                                                               
proc sort data = new out = newsort ;                           
  by visit bmi ;                                               
run ;                                                          
                                                               
data want (keep = id visit bmi) ;                              
  array abmi [9999] _temporary_ ;                               
  array aid  [9999] _temporary_ ;                               
  do n = 1 by 1 until (last.visit) ;                           
    set newsort ;                                              
    by visit ;                                                 
    abmi [n] = bmi ;                                           
    aid  [n] = id ;                                            
  end ;                                                        
  if n > 1 then do i = 1 to n - 1 ;                            
    do j = i + 1 to n ;                                        
      if divide (abmi[j] - abmi[i], abmi[j]) > 0.1 then leave ;
      bmi = abmi [i] ; id = aid [i] ; output ;                 
      bmi = abmi [j] ; id = aid [j] ; output ;                 
    end ;                                                      
  end ;                                                        
run ;                                                          

Note that the sorting, though not strictly necessary, technically speaking, greatly simplifies the code. Also, it helps avoid comparing ALL BMI measurements within the same visit pairwise since in the sorted BMI sequence, the difference between item [i] and item [i+1] is always no greater than between item [i] and item [i+(>1)].

  

HTH

Paul D.

xiangpang
Quartz | Level 8
In this case, all three should be included. Since day1 is same, and 49-53 difference is less than 10% while 45-49 difference is less than 10%.
xiangpang
Quartz | Level 8
Thanks, I will try it later.
Kurt_Bremser
Super User

Given the brevity of a SQL solution:

 

proc sql;
create table want as
select distinct a.*
from new a, new b
where a.visit = b.visit and a.id ne b.id and abs(a.bmi - b.bmi) <= min(a.bmi,b.bmi) / 10
order by visit,id;
quit;

not using it is dumb, IMHO. Unless the performance of SQL because of dataset size is intolerable.

 

hashman
Ammonite | Level 13

@Kurt_Bremser:

 

I doubt your SQL will underperform against a much more sizable input since the optimizer plan for this query uses the SQXJHSH method and doesn't resort to any Cartesian expansions. But even if it did, the simplicity of coding (or, rather, nearly not needing to code) would more than make up for that. To put together a viable DATA step solution in this case, one needs to know infinitely more about procedural programming and think its details out than to write a simple SQL query hardly requiring any programming at all. Like I said, the tool should fit the job, not vice versa. I'm not quite an SQL head; but in this case, it's exactly what the doctor ordered (and my having offered a DATA step solution notwithstanding).

 

Best

Paul D.     

s_manoj
Quartz | Level 8

Hello,

Check this code, this could help your need.

 

proc sort;
by visit;
run;

data want (keep = id visit bmi);
set new (rename = (id = id1 visit = visit1 bmi = bmi1));
do i =1 to n;
	set new point = i nobs = n;
 	v = divide(abs(bmi1 - bmi), bmi1);
	if (visit = visit1) & (id ne id1) & v le 0.1 then output  ;
end;
run;

Regards

Manoj

hashman
Ammonite | Level 13

@s_manoj:

 

Your code runs a de facto Cartesian product reading every obs from one copy of NEW for every obs in another, and it doesn't even take advantage of the sorted order. Hence, it will scale extremely poorly as the input size grows. At nobs=100000, an efficient solution would run in under a second, yet your program would take good 30 minutes. And at nobs=1000000 (which is a very small data set by today's standards) it would take upward of 50 hours.

 

The point= approach could be viable (though still slower than anything else) if you take advantage of the sorted order BY VISIT BMI and (a) use point= to read only the observations between the endpoints of the current BY group and (b) leave the point= loop as soon as the 10% condition is false.

 

Paul D.   

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1076 views
  • 7 likes
  • 5 in conversation