Hi everyone,
I've been struggling with what SAS function to use--thank you so much for your help.
My dataset has multiple ages for the same ID and I transposed all the unique ages (Age1-Age3) and ran a check on the best age, where Check1 is the check for Age1, Check2 for Age2, etc. I want to pick one best age for each ID. The best age is the age with that corresponds to the highest value for Check by ID (eg. in the dataset below, 52 is the best age for C because Check2 has the highest sum and corresponds with Age2). I know how to sum up Check1-Check3 (using Proc means, sum function). But how do I correspond Age1 to Check 1, Age2 with Check2 etc to identify the correct age that corresponds to the highest summed check?
Here is my dataset:
ID | Age | Age1 | Age2 | Age3 | Check1 | Check2 | Check3 |
A | 65 | 65 | 66 | . | 1 | 0 | . |
A | 66 | 65 | 66 | . | 1 | 0 | . |
B | 63 | 63 | . | . | 1 | . | . |
B | 63 | 63 | . | . | 1 | . | . |
C | 68 | 68 | 52 | 54 | 0 | 1 | 1 |
C | 52 | 68 | 52 | 54 | 0 | 1 | 0 |
C | 54 | 68 | 52 | 54 | 0 | 1 | 0 |
Thank you so much,
Louise
Function whichn will only return the first match. To get all matches, use a loop :
data have;
input
ID $ Age Age1 Age2 Age3 Check1 Check2 Check3;
datalines;
A 65 65 66 . 1 0 .
A 66 65 66 . 0 1 .
B 63 63 . . 1 . .
B 63 63 . . 1 . .
C 68 68 52 54 0 1 1
C 52 68 52 54 0 1 0
C 54 68 52 54 0 1 0
;
proc means data=have(drop=age) noprint;
by ID age: notsorted;
output out=checks(drop=_:) sum(check:)=;
run;
data want;
set checks;
array age{*} age:;
array check{*} check:;
do i = 1 to dim(age);
if check{i} = max(of check{*}) then do;
bestAge = age{i};
output;
end;
end;
keep ID bestAge;
run;
proc print data=want noobs; run;
PG
Using two arrays and function whichn :
data have;
input
ID $ Age Age1 Age2 Age3 Check1 Check2 Check3;
datalines;
A 65 65 66 . 1 0 .
A 66 65 66 . 1 0 .
B 63 63 . . 1 . .
B 63 63 . . 1 . .
C 68 68 52 54 0 1 1
C 52 68 52 54 0 1 0
C 54 68 52 54 0 1 0
;
proc means data=have(drop=age) noprint;
by ID age: notsorted;
output out=checks(drop=_:) sum(check:)=;
run;
data want;
set checks;
array age{*} age:;
array check{*} check:;
bestAge = age{whichn(max(of check{*}), of check{*})};
keep ID bestAge;
run;
proc print data=want noobs; run;
PG
Hi PG,
Thank you so much for your help. I ran this, and it works. However, is it possible for the "whichn" function to account for any ties? Such as if the summed value for Check1 and Check 2 are equal-- is there a way to show any ties and the age1 and age2s (if there are ties)?
Thank you so much for this,
Louise
Function whichn will only return the first match. To get all matches, use a loop :
data have;
input
ID $ Age Age1 Age2 Age3 Check1 Check2 Check3;
datalines;
A 65 65 66 . 1 0 .
A 66 65 66 . 0 1 .
B 63 63 . . 1 . .
B 63 63 . . 1 . .
C 68 68 52 54 0 1 1
C 52 68 52 54 0 1 0
C 54 68 52 54 0 1 0
;
proc means data=have(drop=age) noprint;
by ID age: notsorted;
output out=checks(drop=_:) sum(check:)=;
run;
data want;
set checks;
array age{*} age:;
array check{*} check:;
do i = 1 to dim(age);
if check{i} = max(of check{*}) then do;
bestAge = age{i};
output;
end;
end;
keep ID bestAge;
run;
proc print data=want noobs; run;
PG
Hi PG,
Thank you so much. The last code was wonderful and very helpful.
I have one last question. For another part of my code, the whichn function is the best fit. Is the whichn function account for a record with all missing values? Is there a way to allow for missing values so that the code does not come out with an error using the whichn function? For example just changing your first code with an entry (ID B) to all missing values, I get an error " Array subscript out of range at ID=B Age1=. Age2=. Age3=. Check1=. Check2=. Check3=. bestAge=. _ERROR_=1 _N_=2". Is there anyway to work around all missing values for the function whichn? Otherwise, it does exactly what I want for another part of my code.
Thank you so much,
Louise
Code below with ID B changed to missing values:
data have;
input
ID $ Age Age1 Age2 Age3 Check1 Check2 Check3;
datalines;
A 65 65 66 . 1 0 .
A 66 65 66 . 1 0 .
B . . . . . . .
B . . . . . . .
C 68 68 52 54 0 1 1
C 52 68 52 54 0 1 1
C 54 68 52 54 0 1 1
;
proc means data=have(drop=age) noprint;
by ID age: notsorted;
output out=checks(drop=_:) sum(check:)=;
run;
data want;
set checks;
array age{*} age:;
array check{*} check:;
bestAge = age{whichn(max(of check{*}), of check{*})};
keep ID bestAge;
run;
You could treat the all-missing case as a special case, using function n() :
data have;
input
ID $ Age Age1 Age2 Age3 Check1 Check2 Check3;
datalines;
A 65 65 66 . 1 0 .
A 66 65 66 . 1 0 .
B 63 63 . . . . .
B 63 63 . . . . .
C 68 68 52 54 0 1 1
C 52 68 52 54 0 1 0
C 54 68 52 54 0 1 0
;
proc means data=have(drop=age) noprint;
by ID age: notsorted;
output out=checks(drop=_:) sum(check:)=;
run;
data want;
set checks;
array age{*} age:;
array check{*} check:;
if n(of check{*}) > 0 then
bestAge = age{whichn(max(of check{*}), of check{*})};
keep ID bestAge;
run;
proc print data=want noobs; run;
PG
Thank you so much, PG! It was exactly what I wanted.
Thank you so much for your help,
Louise
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.