Hi 🙂
I want to create a conditional variable (outcome) based on accident_id and road_user_type:
- if anyone in an accident was a vulnerable road user > then outcome = 1;
- else if everyone in an accident was a MVO > then outcome = 2;
- else outcome = 3.
Please help 🙂
dataset have;
Accident_ID | Road_user_type |
1 | Vulnerable |
1 | MVO |
1 | . |
2 | . |
2 | MVO |
3 | MVO |
3 | MVO |
3 | MVO |
dataset want;
Accident_ID | Road_user_type | Outcome |
1 | Vulnerable | 1 |
1 | MVO | 1 |
1 | . | 1 |
2 | . | . |
2 | MVO | . |
3 | MVO | 2 |
3 | MVO | 2 |
3 | MVO | 2 |
Data Want;
Set Have;
If Road_user_type = "Vulnerable" then Outcome = 1;
If Road_user_type = "MVO" then Outcome = 2;
Else Outcome = 3;
Run;
Why are there three ones?
Hey Chris,
This is road crash data. The three one means the three people (rows) are each in the same accident (accident_id). Does that make sense?
Like this?
proc sql;
select ACCIDENT_ID
, ROAD_USER_TYPE
, case when sum(ROAD_USER_TYPE='Vulnerable') then 1
when sum(ROAD_USER_TYPE='MVO')=count(*) then 2
else 3 end as OUTCOME
from HAVE
group by ACCIDENT_ID;
ACCIDENT_ID | ROAD_USER_TYPE | OUTCOME |
---|---|---|
1 | 1 | |
1 | Vulnerable | 1 |
1 | MVO | 1 |
2 | MVO | 3 |
2 | 3 | |
3 | MVO | 2 |
3 | MVO | 2 |
3 | MVO | 2 |
data want;
set have;
by accident_id;
retain outcome;
if first.accident_id then do;
if road_user_type eq "Vulnerable" then outcome=1;
else if road_user_type eq "MVO" then outcome=2;
else if road_user_type eq "" then outcome=.;
else outcome=3;
end;
run;
Hello, it s my proposition.
I saw in the case accident_id = 2, road_user_type =. >>> outcome =. >>> then I add a case where outcome =.
So we have 4 cases: outcome =., 1,2,3
Have a nice day.
data have;
infile datalines;
input Accident_ID Road_user_type $10.;
datalines;
1 Vulnerable
1 MVO
1 .
2 .
2 MVO
3 MVO
3 MVO
3 MVO
3 MVO
; run;
data temp1;
set have;
if road_user_type = 'Vulnerable' then road_user_code=1; else
if road_user_type = 'MVO' then road_user_code=2; else
road_user_code=0;
run;
proc sort data=temp1; by Accident_ID road_user_code; run;
data temp2;
set temp1;
by Accident_ID ;
retain outcome;
if first.Accident_ID then outcome=.;
if road_user_code = 1 then outcome=1;
if outcome ne 1 then do;
if first.Accident_ID then do;
if road_user_code = 2
then outcome = 2;
else outcome = .;
end;
else do;
if road_user_code = 2 and
outcome = 2 then;
else outcome = .;
end;
end;
if last.accident_id then output;
run;
data want;
merge have temp2;
by Accident_ID ;
run;
Hi @e_______c
Here is another approach using a PROC SQL:
proc sql;
/* if anyone in an accident was a vulnerable road user > then outcome = 1 */
(select *, 1 as outcome
from have
group by Accident_ID
having sum(Road_user_type="Vulnerable") > 0)
union all
/* else if everyone in an accident was a MVO > then outcome = 2 */
(select *, 2 as outcome
from have
group by Accident_ID
having count(Accident_ID) = sum(Road_user_type="MVO"))
union all
/* else outcome = 3 */
(select *, 3 as outcome
from have
group by Accident_ID
having sum(Road_user_type="Vulnerable") = 0 and count(Accident_ID) > sum(Road_user_type="MVO"))
order by Accident_ID;
quit;
Per your rule, accident_id = 2 should have outcome 3.
This code achieves that:
data have;
input Accident_ID Road_user_type :$10.;
datalines;
1 Vulnerable
1 MVO
1 .
2 .
2 MVO
3 MVO
3 MVO
3 MVO
;
data acc_type;
set have;
by accident_id;
retain flag_mvo flag_vuln;
if first.accident_id
then do;
flag_mvo = 1;
flag_vuln = 0;
end;
if road_user_type ne 'MVO' then flag_mvo = 0;
if road_user_type = 'Vulnerable' than flag_vuln = 1;
if last.accident_id;
if flag_vuln then outcome = 1;
else if flag_mvo then outcome = 2;
else outcome = 3;
keep accident_id outcome;
run;
data want;
merge
have
acc_type
;
by accident_id;
run;
You have a nice compact assignment rule, whose data needs can be satisfied with a self-merge, where each argument (except the last) of the merge statement selects a needed data subset:
data have;
input Accident_ID Road_user_type :$10.;
datalines;
1 Vulnerable
1 MVO
1 .
2 .
2 MVO
3 MVO
3 MVO
3 MVO
;
data want;
merge
have (where=(road_user_type='Vulnerable') in=invul)
have (where=(road_user_type=' ') in=inmiss)
have (where=(road_user_type^='MVO') in=non_mvo)
have ;
by accident_id;
if invul=1 then outcome=1; else
if inmiss=1 then outcome=.; else
if non_mvo=0 then outcome=2; else
outcome=3;
run;
You might ordinarily be worried about mixing data from differing observations in the merge. But because the last argument of the merge is the complete set of records in HAVE, the individual values from those records will overwrite the value obtained from the matching subsets preceding it.
edit: Let me add some further notes on the MERGE statement with BY. Although the merged subsets for each ID can have differing numbers of records from each other and from the full dataset, the IN= dummies will be constant for the entire BY-group. I.e. if you have one record with road_user_type=' ' (and therefore inmiss=1) vs 3 records in the entire BY group, you'll have 3 iterations of the data step (i.e. 3 output records in this case), and for ALL of those iterations, the INMISS will equal 1.
Thanks mkeintz > that worked perfectly! Such a simple thing but I couldn't figure it out. Thanks again!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.