DATA Step, Macro, Functions and more

how to selecet some ID out?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 79
Accepted Solution

how to selecet some ID out?

Hello,

 

I want to select the IDs with same visit and bmi difference is less than 10%. 

 

not proc sql. Anyone knows how to do that?

 

Thanks a lot 

data new;
input id visit bmi ;
cards;
1 day1 45
1 day12 66
1 day60 53
2 day1 45
2 day6 14
2 day12 36
2 day60 83
3 day1 745
3 day12 66
3 day60 50
;
run;

want;
1 day1 45
2 day1 45
1 day12 66
3 day12 66
1 day60 53
3 day60 50

Accepted Solutions
Solution
a week ago
Contributor
Posts: 50

Re: how to selecet some ID out?

Posted in reply to xiangpang

Hello,

Check this code, this could help your need.

 

proc sort;
by visit;
run;

data want (keep = id visit bmi);
set new (rename = (id = id1 visit = visit1 bmi = bmi1));
do i =1 to n;
	set new point = i nobs = n;
 	v = divide(abs(bmi1 - bmi), bmi1);
	if (visit = visit1) & (id ne id1) & v le 0.1 then output  ;
end;
run;

Regards

Manoj

View solution in original post


All Replies
Frequent Contributor
Posts: 79

Re: how to selecet some ID out?

Posted in reply to xiangpang

visit should add $  , sorry for that

Esteemed Advisor
Posts: 5,628

Re: how to selecet some ID out?

Posted in reply to xiangpang

So what if your data contains

 

1 day1 45

2 day1 49

3 day1 53

 

what obs should be selected?

PG
Frequent Contributor
Posts: 112

Re: how to selecet some ID out?

Posted in reply to xiangpang

@xiangpang:

 

Actually, it looks more like a job for SQL, and in all candor, the tool should fit the job, not the other way around. 

But if you don't want SQL, the job can be done with other means - it's SAS, after all.

 

However, as your specs are rather vague, I have to make assumptions, to wit:

 

- If there's only one record in a visit, you don't want the visit in the output; your sample output without DAY6 seems to suggest that, too.

- If there're no BMIs differing by less than 10% in the same visit (not in your sample input data), you don't want that visit in the output, either.       

- No visit has more than 9999 BMI measurements. Methinks it's a reasonable assumption Smiley Wink.

 

With these assumptions:

data new ;                                                     
  input id visit $ bmi ;                                       
  cards ;                                                      
1 day1   45                                                    
1 day12  66                                                    
1 day60  53                                                    
2 day1   45                                                    
2 day6   14                                                    
2 day12  36                                                    
2 day60  83                                                    
3 day1  745                                                    
3 day12  66                                                    
3 day60  50                                                    
run ;                                                          
                                                               
proc sort data = new out = newsort ;                           
  by visit bmi ;                                               
run ;                                                          
                                                               
data want (keep = id visit bmi) ;                              
  array abmi [9999] _temporary_ ;                               
  array aid  [9999] _temporary_ ;                               
  do n = 1 by 1 until (last.visit) ;                           
    set newsort ;                                              
    by visit ;                                                 
    abmi [n] = bmi ;                                           
    aid  [n] = id ;                                            
  end ;                                                        
  if n > 1 then do i = 1 to n - 1 ;                            
    do j = i + 1 to n ;                                        
      if divide (abmi[j] - abmi[i], abmi[j]) > 0.1 then leave ;
      bmi = abmi [i] ; id = aid [i] ; output ;                 
      bmi = abmi [j] ; id = aid [j] ; output ;                 
    end ;                                                      
  end ;                                                        
run ;                                                          

Note that the sorting, though not strictly necessary, technically speaking, greatly simplifies the code. Also, it helps avoid comparing ALL BMI measurements within the same visit pairwise since in the sorted BMI sequence, the difference between item [i] and item [i+1] is always no greater than between item [i] and item [i+(>1)].

  

HTH

Paul D.

Frequent Contributor
Posts: 79

Re: how to selecet some ID out?

Posted in reply to xiangpang
In this case, all three should be included. Since day1 is same, and 49-53 difference is less than 10% while 45-49 difference is less than 10%.
Frequent Contributor
Posts: 79

Re: how to selecet some ID out?

Posted in reply to xiangpang
Thanks, I will try it later.
Super User
Posts: 10,626

Re: how to selecet some ID out?

Posted in reply to xiangpang

Given the brevity of a SQL solution:

 

proc sql;
create table want as
select distinct a.*
from new a, new b
where a.visit = b.visit and a.id ne b.id and abs(a.bmi - b.bmi) <= min(a.bmi,b.bmi) / 10
order by visit,id;
quit;

not using it is dumb, IMHO. Unless the performance of SQL because of dataset size is intolerable.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Frequent Contributor
Posts: 112

Re: how to selecet some ID out?

Posted in reply to KurtBremser

@KurtBremser:

 

I doubt your SQL will underperform against a much more sizable input since the optimizer plan for this query uses the SQXJHSH method and doesn't resort to any Cartesian expansions. But even if it did, the simplicity of coding (or, rather, nearly not needing to code) would more than make up for that. To put together a viable DATA step solution in this case, one needs to know infinitely more about procedural programming and think its details out than to write a simple SQL query hardly requiring any programming at all. Like I said, the tool should fit the job, not vice versa. I'm not quite an SQL head; but in this case, it's exactly what the doctor ordered (and my having offered a DATA step solution notwithstanding).

 

Best

Paul D.     

Solution
a week ago
Contributor
Posts: 50

Re: how to selecet some ID out?

Posted in reply to xiangpang

Hello,

Check this code, this could help your need.

 

proc sort;
by visit;
run;

data want (keep = id visit bmi);
set new (rename = (id = id1 visit = visit1 bmi = bmi1));
do i =1 to n;
	set new point = i nobs = n;
 	v = divide(abs(bmi1 - bmi), bmi1);
	if (visit = visit1) & (id ne id1) & v le 0.1 then output  ;
end;
run;

Regards

Manoj

Frequent Contributor
Posts: 112

Re: how to selecet some ID out?

@s_manoj:

 

Your code runs a de facto Cartesian product reading every obs from one copy of NEW for every obs in another, and it doesn't even take advantage of the sorted order. Hence, it will scale extremely poorly as the input size grows. At nobs=100000, an efficient solution would run in under a second, yet your program would take good 30 minutes. And at nobs=1000000 (which is a very small data set by today's standards) it would take upward of 50 hours.

 

The point= approach could be viable (though still slower than anything else) if you take advantage of the sorted order BY VISIT BMI and (a) use point= to read only the observations between the endpoints of the current BY group and (b) leave the point= loop as soon as the 10% condition is false.

 

Paul D.   

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 156 views
  • 7 likes
  • 5 in conversation