Hello,
I want to select the IDs with same visit and bmi difference is less than 10%.
not proc sql. Anyone knows how to do that?
Thanks a lot
data new;
input id visit bmi ;
cards;
1 day1 45
1 day12 66
1 day60 53
2 day1 45
2 day6 14
2 day12 36
2 day60 83
3 day1 745
3 day12 66
3 day60 50
;
run;
want;
1 day1 45
2 day1 45
1 day12 66
3 day12 66
1 day60 53
3 day60 50
Hello,
Check this code, this could help your need.
proc sort;
by visit;
run;
data want (keep = id visit bmi);
set new (rename = (id = id1 visit = visit1 bmi = bmi1));
do i =1 to n;
set new point = i nobs = n;
v = divide(abs(bmi1 - bmi), bmi1);
if (visit = visit1) & (id ne id1) & v le 0.1 then output ;
end;
run;
Regards
Manoj
visit should add $ , sorry for that
So what if your data contains
1 day1 45
2 day1 49
3 day1 53
what obs should be selected?
Actually, it looks more like a job for SQL, and in all candor, the tool should fit the job, not the other way around.
But if you don't want SQL, the job can be done with other means - it's SAS, after all.
However, as your specs are rather vague, I have to make assumptions, to wit:
- If there's only one record in a visit, you don't want the visit in the output; your sample output without DAY6 seems to suggest that, too.
- If there're no BMIs differing by less than 10% in the same visit (not in your sample input data), you don't want that visit in the output, either.
- No visit has more than 9999 BMI measurements. Methinks it's a reasonable assumption ;).
With these assumptions:
data new ;
input id visit $ bmi ;
cards ;
1 day1 45
1 day12 66
1 day60 53
2 day1 45
2 day6 14
2 day12 36
2 day60 83
3 day1 745
3 day12 66
3 day60 50
run ;
proc sort data = new out = newsort ;
by visit bmi ;
run ;
data want (keep = id visit bmi) ;
array abmi [9999] _temporary_ ;
array aid [9999] _temporary_ ;
do n = 1 by 1 until (last.visit) ;
set newsort ;
by visit ;
abmi [n] = bmi ;
aid [n] = id ;
end ;
if n > 1 then do i = 1 to n - 1 ;
do j = i + 1 to n ;
if divide (abmi[j] - abmi[i], abmi[j]) > 0.1 then leave ;
bmi = abmi [i] ; id = aid [i] ; output ;
bmi = abmi [j] ; id = aid [j] ; output ;
end ;
end ;
run ;
Note that the sorting, though not strictly necessary, technically speaking, greatly simplifies the code. Also, it helps avoid comparing ALL BMI measurements within the same visit pairwise since in the sorted BMI sequence, the difference between item [i] and item [i+1] is always no greater than between item [i] and item [i+(>1)].
HTH
Paul D.
Given the brevity of a SQL solution:
proc sql;
create table want as
select distinct a.*
from new a, new b
where a.visit = b.visit and a.id ne b.id and abs(a.bmi - b.bmi) <= min(a.bmi,b.bmi) / 10
order by visit,id;
quit;
not using it is dumb, IMHO. Unless the performance of SQL because of dataset size is intolerable.
I doubt your SQL will underperform against a much more sizable input since the optimizer plan for this query uses the SQXJHSH method and doesn't resort to any Cartesian expansions. But even if it did, the simplicity of coding (or, rather, nearly not needing to code) would more than make up for that. To put together a viable DATA step solution in this case, one needs to know infinitely more about procedural programming and think its details out than to write a simple SQL query hardly requiring any programming at all. Like I said, the tool should fit the job, not vice versa. I'm not quite an SQL head; but in this case, it's exactly what the doctor ordered (and my having offered a DATA step solution notwithstanding).
Best
Paul D.
Hello,
Check this code, this could help your need.
proc sort;
by visit;
run;
data want (keep = id visit bmi);
set new (rename = (id = id1 visit = visit1 bmi = bmi1));
do i =1 to n;
set new point = i nobs = n;
v = divide(abs(bmi1 - bmi), bmi1);
if (visit = visit1) & (id ne id1) & v le 0.1 then output ;
end;
run;
Regards
Manoj
Your code runs a de facto Cartesian product reading every obs from one copy of NEW for every obs in another, and it doesn't even take advantage of the sorted order. Hence, it will scale extremely poorly as the input size grows. At nobs=100000, an efficient solution would run in under a second, yet your program would take good 30 minutes. And at nobs=1000000 (which is a very small data set by today's standards) it would take upward of 50 hours.
The point= approach could be viable (though still slower than anything else) if you take advantage of the sorted order BY VISIT BMI and (a) use point= to read only the observations between the endpoints of the current BY group and (b) leave the point= loop as soon as the 10% condition is false.
Paul D.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.