Solved: Re: How do I select value closest to X?

pamplemouse22 · Posted 10-26-2016 06:19 PM

Hi,

I am working with a dataset that looks like this. Each participant was scored multiple times and age at exam was recorded.

id visit age score

1 1 80 30

1 2 85 28

1 3 86 25

2 1 84 25

2 2 87 24

3 1 90 28

3 2 95 28

I want to create a new variable that reports the score (score_new) closest to age 85.

Particularly, I want the score at age 85. If there is no value for this, then I want the score at an age +/- 2 years, using the score closest to age 85. If there is no value for this either, then I want the score recorded after age 85, again using the score closest to 85. If there is a tie (e.g., a score at 84 and 86), then I want to keep the younger score. Remaining can be missing.

So, I would like to end with a final data set like this:

id visit age score score_new

1 1 80 30 .

1 2 85 28 28

1 3 86 25 .

2 1 84 25 25

2 2 86 24 .

3 1 90 28 28

3 2 95 28 .

Any help would be appreciated. Happy to explain further. I am new to arrays, so replies with detail would be best.

Looking forward to your feedback and solving this!

Thanks.

PGStats · Posted 10-27-2016 12:35 AM

Ok. Try

proc sql;
create table want as
select a.*,
    case when coalesce(rank, -1)=min(rank) then score else . end as newScore
from 
    have as a left join 
    closenessRanks as b on a.age=b.age
group by id
order by id, visit;
select * from want;
quit;

That should fix it.

PG

View solution in original post

PGStats · Posted 10-26-2016 10:58 PM

Here is a simple way:

data closenessRanks;
input age rank;
datalines;
85 0
84 1
86 2
83 3
87 4
88 5
89 6
90 7
91 8
92 9
93 10
94 11
95 12
96 13
97 14
98 15
99 16
;

data have;
input id visit age score;
datalines;
1 1 80 30      
1 2 85 28
1 3 86 25
2 1 84 25 
2 2 87 24
3 1 90 28
3 2 95 28
;

proc sql;
create table want as
select a.*,
    case when rank=min(rank) then score else . end as newScore
from 
    have as a left join 
    closenessRanks as b on a.age=b.age
group by id
order by id, visit;
select * from want;
quit;

PG

pamplemouse22 · Posted 10-27-2016 12:11 AM

Thank you so much! This works, but there's one small issue to fix. For individuals where none of the scores match one of the ranked ages, the original score is being used as newscore. Instead, I would like these to be missing. How can I change this? Thank you.

PGStats · Posted 10-27-2016 12:35 AM

Ok. Try

proc sql;
create table want as
select a.*,
    case when coalesce(rank, -1)=min(rank) then score else . end as newScore
from 
    have as a left join 
    closenessRanks as b on a.age=b.age
group by id
order by id, visit;
select * from want;
quit;

That should fix it.

PG

pamplemouse22 · Posted 10-27-2016 12:39 AM

Yes, it does! Thank you.

pamplemouse22 · Posted 11-21-2016 02:27 PM

Hi PG,

I found one case when this doesn't work -- when the first visit is missing, and there are values for subsequent visits. Code ends up not pulling any value for new score, when it should. Can't figure out how to change this. Would appreciate a solution for this. Thank you in advance!

PGStats · Posted 11-21-2016 03:20 PM

The problem could be with the way you read your data. Compare the results with and without the infile statement in

data closenessRanks;
input age rank;
datalines;
85 0
84 1
86 2
83 3
87 4
88 5
89 6
90 7
91 8
92 9
93 10
94 11
95 12
96 13
97 14
98 15
99 16
;

data have;
infile datalines missover;
input id visit age score;
datalines;
1 1 80 30      
1 2 85 28
1 3 86 25
2 1 84 25 
2 2 87 24
3 1 90 28
3 2 95 28
4 1 79 88
4 2 80 99
5 1 80 
5 2 85 28
5 3 86 25
;

proc sql;
create table want as
select a.*,
    case when coalesce(rank, -1)=min(rank) then score else . end as newScore
from 
    have as a left join 
    closenessRanks as b on a.age=b.age
group by id
order by id, visit;
select * from want;
quit;

PG

Ksharp · Posted 10-26-2016 11:10 PM

Maybe I understood what you are talking about.

data have;
input id visit age score;
cards;
1 1 80 30      
1 2 85 28
1 3 86 25
2 1 84 25 
2 2 87 24
3 1 90 28
3 2 95 28
;
run;
proc sql;
create table temp1 as
 select *
  from have
   group by id
    having sum(age=85)=0 and age in (83 84 86 87)
     order by id,age;
create table temp2 as
 select *
  from have
   group by id
    having sum(age in (83 84 85 86 87))=0 and age gt 87
     order by id,age;
quit;
data key1;
 set temp1;
 by id;
 if first.id;
run;
data key2;
 set temp2;
 by id;
 if first.id;
run;
data key;
 set have(where=(age=85)) key1 key2;
 keep id age score;
 rename score=new_score;
run;

data want;
 merge have key;
 by id age;
run;

pamplemouse22 · Posted 10-27-2016 12:12 AM

Thank you very much!

The 2025 SAS Hackathon has begun!