Hello everyone, I have a question about how to define status in multiple obs by the same ID.
The dataset I have like this:
ID | Assessment date | Status |
1 | 2012-06-19 | No |
1 | 2012-06-20 | Yes |
1 | 2012-06-21 | No |
2 | 2012-06-22 | . |
2 | 2012-06-23 | No |
2 | 2012-06-24 | . |
2 | 2012-06-25 | No |
3 | 2012-06-19 | Unknown |
3 | 2012-06-20 | Yes |
3 | 2012-06-21 | No |
3 | 2012-06-22 | . |
3 | 2012-06-23 | No |
4 | 2012-06-01 | Unknown |
4 | 2012-06-03 | . |
4 | 2012-06-05 | No |
The dataset I would like to have is like this:
ID | Assessment date | Status | Status_new1 | Status_new2 |
1 | 2012-06-19 | No | Yes | Yes |
1 | 2012-06-20 | Yes | Yes | Yes |
1 | 2012-06-21 | No | Yes | Yes |
2 | 2012-06-22 | . | No | No |
2 | 2012-06-23 | No | No | No |
2 | 2012-06-24 | . | No | No |
2 | 2012-06-25 | No | No | No |
3 | 2012-06-19 | Unknown | Yes | Yes |
3 | 2012-06-20 | Yes | Yes | Yes |
3 | 2012-06-21 | No | Yes | Yes |
3 | 2012-06-22 | . | Yes | Yes |
3 | 2012-06-23 | No | Yes | Yes |
4 | 2012-06-01 | Unknown | No | Unknown |
4 | 2012-06-03 | . | No | Unknown |
4 | 2012-06-05 | No | No | Unknown |
If any obs in the same ID equal to "Yes", the new status variables (No matter is status_new1 or status_new2) should be equal to "Yes".
The only difference between status_new1 and status_new2 is demonstrated in ID 4. Considering ID 4's status are not very clear, I want to create status_new2 to do sensitivity analysis to see if there is any difference when I recode them as "No" and "Unknown".
The tricky part to me is how to recode the new variables when the status included two different statuses in the same patient.
I used "where status = "Yes" " to output all the patients who had status equal to "Yes", and created their status_new1 as "Yes". But I don't think I could use the same way to do for status = "No". Because "Yes" and "No" could be mixed.
Thank you for your help!
Why not just count how many YES, NO and UNKNOWN values you have? Then you can construct the new status versions from that information.
data have ;
input ID Date :yymmdd10. Status $;
format date yymmdd10.;
cards;
1 2012-06-19 No
1 2012-06-20 Yes
1 2012-06-21 No
2 2012-06-22 .
2 2012-06-23 No
2 2012-06-24 .
2 2012-06-25 No
3 2012-06-19 Unknown
3 2012-06-20 Yes
3 2012-06-21 No
3 2012-06-22 .
3 2012-06-23 No
4 2012-06-01 Unknown
4 2012-06-03 .
4 2012-06-05 No
;
proc sql ;
create table want as
select *
, sum(status='Yes') as yes
, sum(status='No') as no
, sum(status='Unknown') as unknown
, case when (calculated yes) then 'Yes' else 'No' end as Status1
, case when (calculated yes) then 'Yes'
when (calculated unknown) then 'Unknown'
when (calculated no) then 'No'
else ' ' end as status2
from have
group by id
order by id, date
;
quit;
Obs ID Date Status yes no unknown Status1 status2 1 1 2012-06-19 No 1 2 0 Yes Yes 2 1 2012-06-20 Yes 1 2 0 Yes Yes 3 1 2012-06-21 No 1 2 0 Yes Yes 4 2 2012-06-22 0 2 0 No No 5 2 2012-06-23 No 0 2 0 No No 6 2 2012-06-24 0 2 0 No No 7 2 2012-06-25 No 0 2 0 No No 8 3 2012-06-19 Unknown 1 2 1 Yes Yes 9 3 2012-06-20 Yes 1 2 1 Yes Yes 10 3 2012-06-21 No 1 2 1 Yes Yes 11 3 2012-06-22 1 2 1 Yes Yes 12 3 2012-06-23 No 1 2 1 Yes Yes 13 4 2012-06-01 Unknown 0 1 1 No Unknown 14 4 2012-06-03 0 1 1 No Unknown 15 4 2012-06-05 No 0 1 1 No Unknown
Why not just count how many YES, NO and UNKNOWN values you have? Then you can construct the new status versions from that information.
data have ;
input ID Date :yymmdd10. Status $;
format date yymmdd10.;
cards;
1 2012-06-19 No
1 2012-06-20 Yes
1 2012-06-21 No
2 2012-06-22 .
2 2012-06-23 No
2 2012-06-24 .
2 2012-06-25 No
3 2012-06-19 Unknown
3 2012-06-20 Yes
3 2012-06-21 No
3 2012-06-22 .
3 2012-06-23 No
4 2012-06-01 Unknown
4 2012-06-03 .
4 2012-06-05 No
;
proc sql ;
create table want as
select *
, sum(status='Yes') as yes
, sum(status='No') as no
, sum(status='Unknown') as unknown
, case when (calculated yes) then 'Yes' else 'No' end as Status1
, case when (calculated yes) then 'Yes'
when (calculated unknown) then 'Unknown'
when (calculated no) then 'No'
else ' ' end as status2
from have
group by id
order by id, date
;
quit;
Obs ID Date Status yes no unknown Status1 status2 1 1 2012-06-19 No 1 2 0 Yes Yes 2 1 2012-06-20 Yes 1 2 0 Yes Yes 3 1 2012-06-21 No 1 2 0 Yes Yes 4 2 2012-06-22 0 2 0 No No 5 2 2012-06-23 No 0 2 0 No No 6 2 2012-06-24 0 2 0 No No 7 2 2012-06-25 No 0 2 0 No No 8 3 2012-06-19 Unknown 1 2 1 Yes Yes 9 3 2012-06-20 Yes 1 2 1 Yes Yes 10 3 2012-06-21 No 1 2 1 Yes Yes 11 3 2012-06-22 1 2 1 Yes Yes 12 3 2012-06-23 No 1 2 1 Yes Yes 13 4 2012-06-01 Unknown 0 1 1 No Unknown 14 4 2012-06-03 0 1 1 No Unknown 15 4 2012-06-05 No 0 1 1 No Unknown
Hello Tom, thank you so much! I just tried your code, it worked perfectly!! Your way is absolutely easy to understand and follow! Thanks a lot!!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.