Hi;
I attached a sample of MyData which include three data sets, first is the crash data includes single obesrvation of each crashe info., second is the vehicle data includes vehicels info. that involved in a crash, third is the participant info. who involved in a crash, as following:
data acc;
infile datalines dlm = " ";
input Kroki Key_ID Severity $ InjurMajor Deaths InjMinor BranchId;
datalines ;
8505804 36412 Damage 0 0 0 31
8581794 36412 Deaths 0 2 0 32
8530886 36412 Damage 0 0 0 34
8536144 38802 Damage 0 0 0 32
8528435 38802 Damage 0 0 0 31
9324143 38802 Damage 0 0 0 34
8525667 38802 Deaths 0 2 0 33
9344281 40400 Damage 0 0 0 34
9357307 40400 Deaths 0 2 0 31
8503111 40400 Damage 0 0 0 33
8538023 32675 Deaths 1 34
8595057 32675 Damage 0 0 0 32
8733084 33281 Deaths 3 1 0 31
8643633 33281 Damage 0 0 0 34
8449397 33281 Damage 0 0 0 32
8743224 38323 Damage 0 0 0 32
8517348 38323 Damage 0 0 0 33
9345729 38323 Damage 0 0 0 34
8544720 38323 Deaths 0 1 0 31
9325908 40503 Damage 0 0 0 31
9336147 40503 Damage 0 0 0 34
9352576 40503 Serious injuries 1 0 0 33
9352537 40663 Damage 0 0 0 33
9359019 40663 Damage 0 0 0 34
9353437 40663 Minor injuries 0 0 1 31
;run;
data veh;
infile datalines dlm = " ";
input Kroki Key_ID Percentage BranchId;
datalines;
8505804 36412 100 31
8505804 36412 0 31
8530886 36412 100 34
8530886 36412 0 34
8581794 36412 100 32
8525667 38802 100 33
8528435 38802 0 31
8528435 38802 100 31
8536144 38802 0 32
8536144 38802 0 32
9324143 38802 100 34
9324143 38802 0 34
8503111 40400 100 33
8503111 40400 0 33
9344281 40400 100 34
9344281 40400 0 34
9357307 40400 0 31
9357307 40400 100 31
8538023 32675 0 34
8538023 32675 100 34
8595057 32675 0 32
8595057 32675 100 32
8449397 33281 0 32
8449397 33281 100 32
8643633 33281 100 34
8643633 33281 0 34
8733084 33281 100 31
8517348 38323 100 33
8517348 38323 0 33
8544720 38323 100 31
8743224 38323 0 32
9345729 38323 0 34
9345729 38323 100 34
9325908 40503 0 31
9325908 40503 0 31
9325908 40503 100 31
9336147 40503 100 34
9336147 40503 0 34
9352576 40503 75 33
9352576 40503 25 33
9352537 40663 100 33
9352537 40663 0 33
9353437 40663 0 31
9353437 40663 100 31
9359019 40663 100 34
9359019 40663 0 34
;run;
data par;
infile datalines dlm = " ";
input Kroki Key_ID Age PartyType $ HealthStatus $ BranchId;
datalines;
8505804 36412 0 Driver intact 31
8505804 36412 0 Driver intact 31
8530886 36412 0 Driver intact 34
8581794 36412 0 Driver intact 32
8581794 36412 0 Passenger Death in hospital 32
8581794 36412 0 Passenger Death in hospital 32
8525667 38802 0 Driver Death in hospital 33
8528435 38802 0 Driver intact 31
8528435 38802 0 Driver intact 31
8536144 38802 0 Driver intact 32
8536144 38802 0 Driver intact 32
9324143 38802 0 Driver intact 34
9324143 38802 0 Driver intact 34
8525667 38802 0 Passenger Death in hospital 33
8503111 40400 0 Driver intact 33
8503111 40400 0 Driver intact 33
9344281 40400 0 Driver intact 34
9344281 40400 0 Driver intact 34
9357307 40400 24 Driver Death at the site 31
9357307 40400 39 Driver intact 31
9357307 40400 16 Passenger Death at the site 31
8538023 32675 0 Driver Other 34
8595057 32675 0 Driver intact 32
8595057 32675 0 Driver intact 32
8538023 32675 0 Runaway driver Death in hospital 34
8449397 33281 0 Driver intact 32
8449397 33281 0 Driver intact 32
8643633 33281 0 Driver intact 34
8643633 33281 0 Driver intact 34
8733084 33281 35 Driver Death at the site 31
8733084 33281 30 Passenger Taken to hospital 31
8733084 33281 0 Passenger Taken to hospital 31
8733084 33281 0 Passenger Taken to hospital 31
8517348 38323 0 Driver intact 33
8517348 38323 0 Driver intact 33
8544720 38323 54 Driver Death at the site 31
8544720 38323 0 Driver Other 31
8743224 38323 0 Driver intact 32
9345729 38323 0 Driver intact 34
9325908 40503 0 Driver intact 31
9325908 40503 0 Driver intact 31
9325908 40503 0 Driver intact 31
9336147 40503 0 Driver intact 34
9336147 40503 0 Driver intact 34
9352576 40503 0 Driver Taken to hospital 33
9352576 40503 0 Driver intact 33
9352537 40663 0 Driver intact 33
9352537 40663 0 Driver intact 33
9353437 40663 0 Driver Treated on-site 31
9353437 40663 0 Driver intact 31
9359019 40663 0 Driver intact 34
9359019 40663 0 Driver intact 34
;run;
what I want is to have one observation for each crash includes:
Kroki Key_ID BranchId Severity InjNo DeathsNo VehNo ;
where all (Acc) variables will be included
(Veh) only who has (100 of Percentage)
(Par) only main driver
VehNo = total number of vehicles involved in a crash (1: single, 2:multiple, 3+: morethan2)
InjNo = total number of injuries
DeathsNo = total number of deaths
I appreciate any help or suggestion and thanks in advance.
Here's another solution, just so we are clear, that is not in your desired dataset. We've spent more time going back and forth because you don't know how to ask a question and explain what it is you want:
proc sql;
create table acc_prep as
select *,sum(InjMinor,InjurMajor) as InjNo
from acc
order by key_id,kroki;
proc sort data=veh;by key_id kroki;
data veh_count;
set veh;
by key_id kroki;
count+1;
Vehcount+1;
if first.kroki then do;
vehcount = 1;
count = 1;
end;
run;
proc sql;
create table veh_prep as
select *,max(vehcount) as VehNo
from veh_count
group by kroki
order by key_id,kroki;
proc sort data=par;by key_id kroki;
data par_count;
set par;
by key_id kroki;
count+1;
Parcount+1;
if first.kroki then do;
count = 1;
parcount = 1;
end;
run;
proc sql;
create table par_prep as
select *,max(parcount) as ParNo
from par_count
group by kroki
order by key_id,kroki;
data final_prep;
merge veh_prep (in=a where=(percentage = 100))
par_prep (in=b);
by key_id kroki count;
if a and b;
run;
data final_merge;
merge acc_prep (in=a)
final_prep(in=b);
by key_id kroki;
if a and b;
run;
proc sql;
create table want as
select distinct kroki,key_id,accidenttime,acctype,accpoint,wlight,severity,publicdamage,accreasons,accidentangle,injurmajor,deaths,injminor,
branchid,percentage,vehiclemake,vehiclemodel,vehiclecolor,accidentplace,age,partytype,healthstatus,nationality,InjNo,vehno,parno
from final_merge
order by kroki,key_id;
Could you please show what the output would look like for one or two values of key_id such as 36412?
Your PAR data set code isn't quite correct. With space delimited none of the values for BRANCHID are correct when the HealthStatus is more than one word. There is also the same issue with what looks like "Runaway driver" for Partytype. And the character variables will need to have a declared length long enough to hold the full text.
here is an example of (36412),
Kroki Key_ID BranchId Severity InjurMajor Deaths InjMinor Percentage Age PartyType HealthStatus InjNo DeathsNo VehNo
8505804 36412 31 Damage 0 0 0 100 0 Driver intact 0 0 2
8581794 36412 32 Deaths 0 2 0 100 0 Driver intact 0 2 1
8530886 36412 34 Damage 0 0 0 100 0 Driver intact 0 0 2
as you can see that there are (Acc) of (36412) for each there is diff. Kroki and BranchId for (veh) and (prt) info. e.g. second one is (Deaths) the driver is (intact) and there are 2 deaths which is single vehicle crash.
Minor changes for the solution:
data acc;
infile datalines dlm = " ";
input Kroki Key_ID Severity $ InjurMajor Deaths InjMinor BranchId;
datalines ;
8505804 36412 Damage 0 0 0 31
8581794 36412 Deaths 0 2 0 32
8530886 36412 Damage 0 0 0 34
8536144 38802 Damage 0 0 0 32
8528435 38802 Damage 0 0 0 31
9324143 38802 Damage 0 0 0 34
8525667 38802 Deaths 0 2 0 33
9344281 40400 Damage 0 0 0 34
9357307 40400 Deaths 0 2 0 31
8503111 40400 Damage 0 0 0 33
8538023 32675 Deaths 1 34
8595057 32675 Damage 0 0 0 32
8733084 33281 Deaths 3 1 0 31
8643633 33281 Damage 0 0 0 34
8449397 33281 Damage 0 0 0 32
8743224 38323 Damage 0 0 0 32
8517348 38323 Damage 0 0 0 33
9345729 38323 Damage 0 0 0 34
8544720 38323 Deaths 0 1 0 31
9325908 40503 Damage 0 0 0 31
9336147 40503 Damage 0 0 0 34
9352576 40503 Serious injuries 1 0 0 33
9352537 40663 Damage 0 0 0 33
9359019 40663 Damage 0 0 0 34
9353437 40663 Minor injuries 0 0 1 31
;run;
proc sort data=acc;by key_id kroki;
data acc_prep;
set acc;
by key_id kroki;
DeathsNo+deaths;
if first.key_id then DeathsNo = deaths;
InjNo=sum(injurmajor,injminor);
run;
data veh;
infile datalines dlm = " ";
input Kroki Key_ID Percentage BranchId;
datalines;
8505804 36412 100 31
8505804 36412 0 31
8530886 36412 100 34
8530886 36412 0 34
8581794 36412 100 32
8525667 38802 100 33
8528435 38802 0 31
8528435 38802 100 31
8536144 38802 0 32
8536144 38802 0 32
9324143 38802 100 34
9324143 38802 0 34
8503111 40400 100 33
8503111 40400 0 33
9344281 40400 100 34
9344281 40400 0 34
9357307 40400 0 31
9357307 40400 100 31
8538023 32675 0 34
8538023 32675 100 34
8595057 32675 0 32
8595057 32675 100 32
8449397 33281 0 32
8449397 33281 100 32
8643633 33281 100 34
8643633 33281 0 34
8733084 33281 100 31
8517348 38323 100 33
8517348 38323 0 33
8544720 38323 100 31
8743224 38323 0 32
9345729 38323 0 34
9345729 38323 100 34
9325908 40503 0 31
9325908 40503 0 31
9325908 40503 100 31
9336147 40503 100 34
9336147 40503 0 34
9352576 40503 75 33
9352576 40503 25 33
9352537 40663 100 33
9352537 40663 0 33
9353437 40663 0 31
9353437 40663 100 31
9359019 40663 100 34
9359019 40663 0 34
;run;
proc sort data=veh;by key_id kroki;
data veh_prep;
set veh;
by key_id kroki;
VehNo+1;
if first.kroki then vehno = 1;
run;
proc sql;
create table max_veh as
select kroki,key_id,Percentage,max(VehNo) as VehNo
from veh_prep
group by kroki
order by key_id,kroki;
data par;
infile datalines dlm = " ";
input Kroki Key_ID Age PartyType $ HealthStatus $ BranchId;
datalines;
8505804 36412 0 Driver intact 31
8505804 36412 0 Driver intact 31
8530886 36412 0 Driver intact 34
8581794 36412 0 Driver intact 32
8581794 36412 0 Passenger Death in hospital 32
8581794 36412 0 Passenger Death in hospital 32
8525667 38802 0 Driver Death in hospital 33
8528435 38802 0 Driver intact 31
8528435 38802 0 Driver intact 31
8536144 38802 0 Driver intact 32
8536144 38802 0 Driver intact 32
9324143 38802 0 Driver intact 34
9324143 38802 0 Driver intact 34
8525667 38802 0 Passenger Death in hospital 33
8503111 40400 0 Driver intact 33
8503111 40400 0 Driver intact 33
9344281 40400 0 Driver intact 34
9344281 40400 0 Driver intact 34
9357307 40400 24 Driver Death at the site 31
9357307 40400 39 Driver intact 31
9357307 40400 16 Passenger Death at the site 31
8538023 32675 0 Driver Other 34
8595057 32675 0 Driver intact 32
8595057 32675 0 Driver intact 32
8538023 32675 0 Runaway driver Death in hospital 34
8449397 33281 0 Driver intact 32
8449397 33281 0 Driver intact 32
8643633 33281 0 Driver intact 34
8643633 33281 0 Driver intact 34
8733084 33281 35 Driver Death at the site 31
8733084 33281 30 Passenger Taken to hospital 31
8733084 33281 0 Passenger Taken to hospital 31
8733084 33281 0 Passenger Taken to hospital 31
8517348 38323 0 Driver intact 33
8517348 38323 0 Driver intact 33
8544720 38323 54 Driver Death at the site 31
8544720 38323 0 Driver Other 31
8743224 38323 0 Driver intact 32
9345729 38323 0 Driver intact 34
9325908 40503 0 Driver intact 31
9325908 40503 0 Driver intact 31
9325908 40503 0 Driver intact 31
9336147 40503 0 Driver intact 34
9336147 40503 0 Driver intact 34
9352576 40503 0 Driver Taken to hospital 33
9352576 40503 0 Driver intact 33
9352537 40663 0 Driver intact 33
9352537 40663 0 Driver intact 33
9353437 40663 0 Driver Treated on-site 31
9353437 40663 0 Driver intact 31
9359019 40663 0 Driver intact 34
9359019 40663 0 Driver intact 34
;run;
proc sort data=par;by key_id kroki;
data want(keep=Kroki Key_ID BranchId Severity InjNo DeathsNo VehNo) ;
merge acc_prep(in=a)
max_veh(in=b where=(percentage=100))
par(in=c where=(PartyType='Driver'));
by key_id kroki;
if a;
run;
proc sql;
create table test as
select distinct *
from want
where key_id = 36412;
dear @Steelers_In_DC
thanks for your valuable solution but the (want) data set has more obsevations than (acc) data set.
I was picking up duplicate records where branchid is missing in the PAR dataset. Changing the last two steps will fix the issue:
data want(keep=Kroki Key_ID BranchId Severity InjNo DeathsNo VehNo) ;
merge acc_prep(in=a)
max_veh(in=b where=(percentage=100))
par(in=c drop= BranchId where=(PartyType='Driver'));
by key_id kroki;
if a;
run;
proc sql;
create table final as
select distinct *
from want
order by key_id,kroki;*
where key_id = 36412;
dear @Steelers_In_DC
also, it seems wrong when i want to add (total number of par) as (vehno) code, as following:
data par_prep;
set par;
by key_id kroki;
ParNo+1;
if first.kroki then par = 1;
run;
proc sql;
create table max_par as
select kroki,key_id,PartyType,max(parno) as Parno
from par_prep
group by kroki
order by key_id,kroki;
then add it to merge as following :
data want(keep=Kroki Key_ID BranchId Severity InjNo DeathsNo VehNo) ;
merge acc_prep(in=a)
max_veh(in=b where=(percentage=100))
max_par(in=b where=(PartyType='Driver'))
par(in=c drop= BranchId where=(PartyType='Driver'));
by key_id kroki;
if a;
run;
proc sql;
create table final as
select distinct *
from want
order by key_id,kroki;
Give me an example of a record that this produces incorrectly and tell me what the desired output is. Otherwise I have no idea what the problem is.
dear @Steelers_In_DC
please find an example attached;
data acc;
infile cards dlm=' ';
input Kroki Key_ID AccidentTime $ AccType $ AccPoint $ WLight $ Severity $ PublicDamage $ AccReasons $
AccidentAngle $ InjurMajor Deaths InjMinor BranchId;
cards;
8578421 34969 07/06/201301:58PM Bumpedamovingvehicle - - Damage - preoccupationonDriving - 0 0 0 34
8799176 34969 09/05/201308:21AM Bumpedamovingvehicle - - Damage - distance - 0 0 0 31
8732911 34969 05/04/201304:51AM Other Straight Clear Deaths Nothing speeding FacetoFace 1 1 0 32
;run;
data veh;
infile cards dlm=' ';
input Kroki Key_ID Percentage VehicleMake $ VehicleModel VehicleColor $ AccidentPlace $ BranchId;
cards;
8578421 34969 0 - 0 - Front 34
8578421 34969 100 - 0 - Other 34
8732911 34969 0 to 0 - Other 32
8732911 34969 100 mr 1992 - Other 32
8799176 34969 0 cr 2003 - Front 31
8799176 34969 100 - 0 - Front 31
;run;
data par;
infile cards dlm=' ';
input Kroki Key_ID Age PartyType $ HealthStatus $ Nationality $ BranchId;
cards;
8578421 34969 0 Driver intact Ym 34
8578421 34969 0 Driver Other SD 34
8732911 34969 0 Driver Takentohospital Ym 32
8732911 34969 0 Driver intact Pk 32
8732911 34969 0 Passenger Deathinhospital Pk 32
8799176 34969 0 Driver intact Other 31
8799176 34969 0 Driver intact SD 31
;run;
data want;
infile cards dlm=' ';
input Kroki Key_ID AccidentTime $ AccType $ AccPoint $ WLight $ Severity $ PublicDamage $ AccReasons $
AccidentAngle $ InjurMajor Deaths InjMinor BranchId Percentage VehicleMake $ VehicleModel VehicleColor $
AccidentPlace $ Age PartyType $ HealthStatus $ Nationality $;
cards;
8578421 34969 07/06/201301:58PM Bumpedamovingvehicle - - Damage - preoccupationonDriving - 0 0 0 34 100 - 0 - Other 0 Driver Other SD
8799176 34969 09/05/201308:21AM Bumpedamovingvehicle - - Damage - distance - 0 0 0 31 100 - 0 - Front 0 Driver intact SD
8732911 34969 05/04/201304:51AM Other Straight Clear Deaths Nothing speeding FacetoFace 1 1 0 32 100 mr 1992 - Other 0 Driver intact Pk
run;
In your original post you state:
what I want is to have one observation for each crash includes:
Kroki Key_ID BranchId Severity InjNo DeathsNo VehNo ;
What you have here is completely different. You have to re-explain what it is you are trying to achieve.
dear @Steelers_In_DC
you are right and i am sorry i did not iclude the VehNo, InjNo and ParNo;
Here I just include all other variables. from orignal post i got how to count (veh) and (Par) based on your code. but it was hard to merge with all variables:
what I want ( is to have one observation for each crash includes) all vaiables in additon to the total number of (InjNo), (veh) and (Par).
the one observation should includes all variables of (acc) in addition to (InjNo) which is the total of number of (InjurMajor+InjMinor) ,
all variables of (Veh) where percentage=100 in addition to (VehNo) which is the count of number of Veh in each crash,
all variables of (Par) where PartyType='Driver' in addition to (Par) which is the count of number Par in each crash.
so the output should look like this:
data want;
infile cards dlm=' ';
input Kroki Key_ID AccidentTime $ AccType $ AccPoint $ WLight $ Severity $ PublicDamage $ AccReasons $
AccidentAngle $ InjurMajor Deaths InjMinor BranchId Percentage VehicleMake $ VehicleModel VehicleColor $
AccidentPlace $ Age PartyType $ HealthStatus $ Nationality $ InjNo VehNo ParNo;
cards;
8578421 34969 07/06/201301:58PM Bumpedamovingvehicle - - Damage - preoccupationonDriving - 0 0 0 34 100 - 0 - Other 0 Driver Other SD 0 2 2
8799176 34969 09/05/201308:21AM Bumpedamovingvehicle - - Damage - distance - 0 0 0 31 100 - 0 - Front 0 Driver intact SD 0 2 2
8732911 34969 05/04/201304:51AM Other Straight Clear Deaths Nothing speeding FacetoFace 1 1 0 32 100 mr 1992 - Other 0 Driver intact Pk 1 2 3
run;
How are you selecting what record comes from the 'par' dataset. You only have one nationality per kroki in your output, there are 2-3 in this dataset. What is the criteria?
daer @Steelers_In_DC
that is very good question:
the criteria is only (PartyType)in Par dataset.
let me give an example:
for the 3rd obs of Acc there are two observtions (the 3rd and the 4th) from Veh (meaning VehNo=2) where the (4th obs shows the percentage =100 which i am interested in) to find the info. from Par dataset, there are three observations (meaning ParNo=3) (3rd, 4th and 5th) the (3rd obs is belong to the 3rd obs from Veh) so the (4th and 5th is belong to the 4th obs from Veh so i will choose the 4th from Par becaese the PartyType ='Driver').
Thanks again
Kroki 8799176 has two entries in the Par dataset, both of them are driver. How do you select between the two?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.