Re: Conditional first. & last.

e_______c · Posted 04-14-2020 10:55 PM

Hi 🙂

I want to create a conditional variable (outcome) based on accident_id and road_user_type:

- if anyone in an accident was a vulnerable road user > then outcome = 1;

- else if everyone in an accident was a MVO > then outcome = 2;

- else outcome = 3.

Please help 🙂

dataset have;

Accident_ID	Road_user_type
1	Vulnerable
1	MVO
1	.
2	.
2	MVO
3	MVO
3	MVO
3	MVO

dataset want;

Accident_ID	Road_user_type	Outcome
1	Vulnerable	1
1	MVO	1
1	.	1
2	.	.
2	MVO	.
3	MVO	2
3	MVO	2
3	MVO	2

sdhilip · Posted 04-15-2020 01:12 AM

Data Want;
Set Have;
If Road_user_type = "Vulnerable" then Outcome = 1;
If Road_user_type = "MVO" then Outcome = 2;
Else Outcome = 3;
Run;

ChrisNZ · Posted 04-15-2020 02:26 AM

Why are there three ones?

High-Performance SAS Coding - Third Edition

e_______c · Posted 04-15-2020 02:31 AM

Hey Chris,

This is road crash data. The three one means the three people (rows) are each in the same accident (accident_id). Does that make sense?

ChrisNZ · Posted 04-15-2020 08:06 AM

Like this?

proc sql;
  select ACCIDENT_ID
       , ROAD_USER_TYPE 
       , case when sum(ROAD_USER_TYPE='Vulnerable')   then 1
              when sum(ROAD_USER_TYPE='MVO')=count(*) then 2
              else                                         3 end as OUTCOME
  from HAVE
  group by ACCIDENT_ID;


ACCIDENT_ID	ROAD_USER_TYPE	OUTCOME
1		1
1	Vulnerable	1
1	MVO	1
2	MVO	3
2		3
3	MVO	2
3	MVO	2
3	MVO	2

High-Performance SAS Coding - Third Edition

kelxxx · Posted 04-15-2020 03:22 AM

data want;
set have;
by accident_id;
retain outcome;
if first.accident_id then do;
if road_user_type eq "Vulnerable" then outcome=1;
else if road_user_type eq "MVO" then outcome=2;
else if road_user_type eq "" then outcome=.;
else outcome=3;
end;
run;

Hello, it s my proposition.

I saw in the case accident_id = 2, road_user_type =. >>> outcome =. >>> then I add a case where outcome =.
So we have 4 cases: outcome =., 1,2,3

Have a nice day.

Shmuel · Posted 04-15-2020 03:50 AM

data have;
 infile datalines;
 input Accident_ID Road_user_type $10.;
datalines;
1 Vulnerable
1 MVO
1 .
2 .
2 MVO
3 MVO
3 MVO
3 MVO
3 MVO
; run;
 
 data temp1;
 set have;
       if road_user_type = 'Vulnerable' then road_user_code=1; else
       if road_user_type = 'MVO' then road_user_code=2; else
         road_user_code=0;
run;

proc sort data=temp1; by Accident_ID road_user_code; run;

data temp2;
 set temp1;
    by Accident_ID ;
       retain outcome;
       if first.Accident_ID then outcome=.;
       if road_user_code = 1 then outcome=1;
       if outcome ne 1 then do;
          if first.Accident_ID then do;
             if road_user_code = 2
                then outcome = 2;
                else outcome = .;
          end; 
          else do;
             if road_user_code = 2 and
                outcome = 2 then;
             else outcome = .;
          end;
      end;
      if last.accident_id then output;
run;

data want;
 merge have temp2;
  by Accident_ID ;
run;

ed_sas_member · Posted 04-15-2020 03:51 AM

Hi @e_______c

Here is another approach using a PROC SQL:

proc sql;

	/* if anyone in an accident was a vulnerable road user > then outcome = 1 */
	(select *, 1 as outcome
	from have
	group by Accident_ID
	having sum(Road_user_type="Vulnerable") > 0)
	
	union all
	
	/* else if everyone in an accident was a MVO > then outcome = 2 */
	(select *, 2 as outcome
	from have
	group by Accident_ID
	having count(Accident_ID) = sum(Road_user_type="MVO"))

	union all
	
	/* else outcome = 3 */
	(select *, 3 as outcome
	from have
	group by Accident_ID
	having sum(Road_user_type="Vulnerable") = 0 and count(Accident_ID) > sum(Road_user_type="MVO"))
	
	order by Accident_ID;
quit;

Kurt_Bremser · Posted 04-15-2020 03:52 AM

Per your rule, accident_id = 2 should have outcome 3.

This code achieves that:

data have;
input Accident_ID Road_user_type :$10.;
datalines;
1 Vulnerable
1 MVO
1 .
2 .
2 MVO
3 MVO
3 MVO
3 MVO
;

data acc_type;
set have;
by accident_id;
retain flag_mvo flag_vuln;
if first.accident_id
then do;
  flag_mvo = 1;
  flag_vuln = 0;
end;
if road_user_type ne 'MVO' then flag_mvo = 0;
if road_user_type = 'Vulnerable' than flag_vuln = 1;
if last.accident_id;
if flag_vuln then outcome = 1;
else if flag_mvo then outcome = 2;
else outcome = 3;
keep accident_id outcome;
run;

data want;
merge
  have
  acc_type
;
by accident_id;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 04-15-2020 01:11 PM

You have a nice compact assignment rule, whose data needs can be satisfied with a self-merge, where each argument (except the last) of the merge statement selects a needed data subset:

data have;
input Accident_ID Road_user_type :$10.;
datalines;
1 Vulnerable
1 MVO
1 .
2 .
2 MVO
3 MVO
3 MVO
3 MVO
;

data want;
  merge 
      have (where=(road_user_type='Vulnerable') in=invul)
      have (where=(road_user_type=' ') in=inmiss)
      have (where=(road_user_type^='MVO') in=non_mvo)
      have ;
  by accident_id;
  if invul=1   then outcome=1; else
  if inmiss=1  then outcome=.; else
  if non_mvo=0 then outcome=2; else
  outcome=3;
run;

You might ordinarily be worried about mixing data from differing observations in the merge. But because the last argument of the merge is the complete set of records in HAVE, the individual values from those records will overwrite the value obtained from the matching subsets preceding it.

edit: Let me add some further notes on the MERGE statement with BY. Although the merged subsets for each ID can have differing numbers of records from each other and from the full dataset, the IN= dummies will be constant for the entire BY-group. I.e. if you have one record with road_user_type=' ' (and therefore inmiss=1) vs 3 records in the entire BY group, you'll have 3 iterations of the data step (i.e. 3 output records in this case), and for ALL of those iterations, the INMISS will equal 1.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

e_______c · Posted 04-15-2020 09:41 PM

Thanks mkeintz > that worked perfectly! Such a simple thing but I couldn't figure it out. Thanks again!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away