DATA Step, Macro, Functions and more

count and merge

Accepted Solution Solved
Reply
Contributor
Posts: 53
Accepted Solution

count and merge

Hi;

 

I attached a sample of MyData which include three data sets, first is the crash data includes single obesrvation of each crashe info., second is the vehicle data includes vehicels info. that involved in a crash, third is the participant info.  who involved in a crash, as following:

 

data acc;
infile datalines dlm = " ";
input Kroki Key_ID Severity $ InjurMajor Deaths InjMinor BranchId;
datalines ;
8505804 36412 Damage 0 0 0 31
8581794 36412 Deaths 0 2 0 32
8530886 36412 Damage 0 0 0 34
8536144 38802 Damage 0 0 0 32
8528435 38802 Damage 0 0 0 31
9324143 38802 Damage 0 0 0 34
8525667 38802 Deaths 0 2 0 33
9344281 40400 Damage 0 0 0 34
9357307 40400 Deaths 0 2 0 31
8503111 40400 Damage 0 0 0 33
8538023 32675 Deaths 1 34
8595057 32675 Damage 0 0 0 32
8733084 33281 Deaths 3 1 0 31
8643633 33281 Damage 0 0 0 34
8449397 33281 Damage 0 0 0 32
8743224 38323 Damage 0 0 0 32
8517348 38323 Damage 0 0 0 33
9345729 38323 Damage 0 0 0 34
8544720 38323 Deaths 0 1 0 31
9325908 40503 Damage 0 0 0 31
9336147 40503 Damage 0 0 0 34
9352576 40503 Serious injuries 1 0 0 33
9352537 40663 Damage 0 0 0 33
9359019 40663 Damage 0 0 0 34
9353437 40663 Minor injuries 0 0 1 31
;run;

 

data veh;
infile datalines dlm = " ";
input Kroki Key_ID Percentage BranchId;
datalines;
8505804 36412 100 31
8505804 36412 0 31
8530886 36412 100 34
8530886 36412 0 34
8581794 36412 100 32
8525667 38802 100 33
8528435 38802 0 31
8528435 38802 100 31
8536144 38802 0 32
8536144 38802 0 32
9324143 38802 100 34
9324143 38802 0 34
8503111 40400 100 33
8503111 40400 0 33
9344281 40400 100 34
9344281 40400 0 34
9357307 40400 0 31
9357307 40400 100 31
8538023 32675 0 34
8538023 32675 100 34
8595057 32675 0 32
8595057 32675 100 32
8449397 33281 0 32
8449397 33281 100 32
8643633 33281 100 34
8643633 33281 0 34
8733084 33281 100 31
8517348 38323 100 33
8517348 38323 0 33
8544720 38323 100 31
8743224 38323 0 32
9345729 38323 0 34
9345729 38323 100 34
9325908 40503 0 31
9325908 40503 0 31
9325908 40503 100 31
9336147 40503 100 34
9336147 40503 0 34
9352576 40503 75 33
9352576 40503 25 33
9352537 40663 100 33
9352537 40663 0 33
9353437 40663 0 31
9353437 40663 100 31
9359019 40663 100 34
9359019 40663 0 34
;run;

 

data par;
infile datalines dlm = " ";
input Kroki Key_ID Age PartyType $ HealthStatus $ BranchId;
datalines;
8505804 36412 0 Driver intact 31
8505804 36412 0 Driver intact 31
8530886 36412 0 Driver intact 34
8581794 36412 0 Driver intact 32
8581794 36412 0 Passenger Death in hospital 32
8581794 36412 0 Passenger Death in hospital 32
8525667 38802 0 Driver Death in hospital 33
8528435 38802 0 Driver intact 31
8528435 38802 0 Driver intact 31
8536144 38802 0 Driver intact 32
8536144 38802 0 Driver intact 32
9324143 38802 0 Driver intact 34
9324143 38802 0 Driver intact 34
8525667 38802 0 Passenger Death in hospital 33
8503111 40400 0 Driver intact 33
8503111 40400 0 Driver intact 33
9344281 40400 0 Driver intact 34
9344281 40400 0 Driver intact 34
9357307 40400 24 Driver Death at the site 31
9357307 40400 39 Driver intact 31
9357307 40400 16 Passenger Death at the site 31
8538023 32675 0 Driver Other 34
8595057 32675 0 Driver intact 32
8595057 32675 0 Driver intact 32
8538023 32675 0 Runaway driver Death in hospital 34
8449397 33281 0 Driver intact 32
8449397 33281 0 Driver intact 32
8643633 33281 0 Driver intact 34
8643633 33281 0 Driver intact 34
8733084 33281 35 Driver Death at the site 31
8733084 33281 30 Passenger Taken to hospital 31
8733084 33281 0 Passenger Taken to hospital 31
8733084 33281 0 Passenger Taken to hospital 31
8517348 38323 0 Driver intact 33
8517348 38323 0 Driver intact 33
8544720 38323 54 Driver Death at the site 31
8544720 38323 0 Driver Other 31
8743224 38323 0 Driver intact 32
9345729 38323 0 Driver intact 34
9325908 40503 0 Driver intact 31
9325908 40503 0 Driver intact 31
9325908 40503 0 Driver intact 31
9336147 40503 0 Driver intact 34
9336147 40503 0 Driver intact 34
9352576 40503 0 Driver Taken to hospital 33
9352576 40503 0 Driver intact 33
9352537 40663 0 Driver intact 33
9352537 40663 0 Driver intact 33
9353437 40663 0 Driver Treated on-site 31
9353437 40663 0 Driver intact 31
9359019 40663 0 Driver intact 34
9359019 40663 0 Driver intact 34
;run;

 

what I want is to have one observation for each crash includes:

 

Kroki Key_ID BranchId Severity InjNo DeathsNo VehNo ;

 

where all (Acc) variables will be included
               (Veh) only who has (100 of Percentage)
               (Par) only main driver
VehNo = total number of vehicles involved in a crash (1: single, 2:multiple, 3+: morethan2)
InjNo = total number of injuries
DeathsNo = total number of deaths

 

I appreciate any help or suggestion and thanks in advance.


Accepted Solutions
Solution
‎02-01-2016 06:20 PM
Valued Guide
Posts: 860

Re: count and merge

Here's another solution, just so we are clear, that is not in your desired dataset.  We've spent more time going back and forth because you don't know how to ask a question and explain what it is you want:

 

proc sql;
create table acc_prep as
select *,sum(InjMinor,InjurMajor) as InjNo
from acc
order by key_id,kroki;

proc sort data=veh;by key_id kroki;

data veh_count;
set veh;
by key_id kroki;
count+1;
Vehcount+1;
if first.kroki then do;
    vehcount = 1;
    count = 1;
end;
run;

proc sql;
create table veh_prep as
select *,max(vehcount) as VehNo
from veh_count
group by kroki
order by key_id,kroki;

proc sort data=par;by key_id kroki;

data par_count;
set par;
by key_id kroki;
count+1;
Parcount+1;
if first.kroki then do;
count = 1;
parcount = 1;
end;
run;

proc sql;
create table par_prep as
select *,max(parcount) as ParNo
from par_count
group by kroki
order by key_id,kroki;

data final_prep;
merge veh_prep (in=a where=(percentage = 100))
      par_prep (in=b);
by key_id kroki count;
if a and b;
run;

data final_merge;
merge acc_prep  (in=a)
      final_prep(in=b);
by key_id kroki;
if a and b;
run;

proc sql;
create table want as
select distinct kroki,key_id,accidenttime,acctype,accpoint,wlight,severity,publicdamage,accreasons,accidentangle,injurmajor,deaths,injminor,
branchid,percentage,vehiclemake,vehiclemodel,vehiclecolor,accidentplace,age,partytype,healthstatus,nationality,InjNo,vehno,parno
from final_merge
order by kroki,key_id;

View solution in original post


All Replies
Super User
Posts: 11,343

Re: count and merge

Could you please show what the output would look like for one or two values of  key_id such as 36412?

 

Your PAR data set code isn't quite correct. With space delimited none of the values for BRANCHID are correct when the HealthStatus is more than one word. There is also the same issue with what looks like "Runaway driver" for Partytype. And the character variables will need to have a declared length long enough to hold the full text.

Contributor
Posts: 53

Re: count and merge

 

here is an example of (36412),

 

Kroki        Key_ID   BranchId Severity InjurMajor Deaths InjMinor Percentage Age PartyType HealthStatus InjNo DeathsNo VehNo
8505804  36412     31           Damage      0             0           0            100          0        Driver            intact        0         0            2

8581794  36412     32           Deaths        0             2           0            100          0        Driver            intact        0         2            1
8530886  36412     34           Damage      0             0           0            100          0        Driver            intact        0         0            2

 

as you can see that there are (Acc) of (36412) for each there is diff. Kroki and BranchId for (veh) and (prt) info. e.g. second one is (Deaths) the driver is (intact) and there are 2 deaths which is single vehicle crash.

 

 

 

Valued Guide
Posts: 860

Re: count and merge

Minor changes for the solution:

 


data acc;
infile datalines dlm = " ";
input Kroki Key_ID Severity $ InjurMajor Deaths InjMinor BranchId;
datalines ;
8505804 36412 Damage 0 0 0 31
8581794 36412 Deaths 0 2 0 32
8530886 36412 Damage 0 0 0 34
8536144 38802 Damage 0 0 0 32
8528435 38802 Damage 0 0 0 31
9324143 38802 Damage 0 0 0 34
8525667 38802 Deaths 0 2 0 33
9344281 40400 Damage 0 0 0 34
9357307 40400 Deaths 0 2 0 31
8503111 40400 Damage 0 0 0 33
8538023 32675 Deaths 1 34
8595057 32675 Damage 0 0 0 32
8733084 33281 Deaths 3 1 0 31
8643633 33281 Damage 0 0 0 34
8449397 33281 Damage 0 0 0 32
8743224 38323 Damage 0 0 0 32
8517348 38323 Damage 0 0 0 33
9345729 38323 Damage 0 0 0 34
8544720 38323 Deaths 0 1 0 31
9325908 40503 Damage 0 0 0 31
9336147 40503 Damage 0 0 0 34
9352576 40503 Serious injuries 1 0 0 33
9352537 40663 Damage 0 0 0 33
9359019 40663 Damage 0 0 0 34
9353437 40663 Minor injuries 0 0 1 31
;run;

proc sort data=acc;by key_id kroki;

data acc_prep;
set acc;
by key_id kroki;
DeathsNo+deaths;
if first.key_id then DeathsNo = deaths;
InjNo=sum(injurmajor,injminor);
run;

data veh;
infile datalines dlm = " ";
input Kroki Key_ID Percentage BranchId;
datalines;
8505804 36412 100 31
8505804 36412 0 31
8530886 36412 100 34
8530886 36412 0 34
8581794 36412 100 32
8525667 38802 100 33
8528435 38802 0 31
8528435 38802 100 31
8536144 38802 0 32
8536144 38802 0 32
9324143 38802 100 34
9324143 38802 0 34
8503111 40400 100 33
8503111 40400 0 33
9344281 40400 100 34
9344281 40400 0 34
9357307 40400 0 31
9357307 40400 100 31
8538023 32675 0 34
8538023 32675 100 34
8595057 32675 0 32
8595057 32675 100 32
8449397 33281 0 32
8449397 33281 100 32
8643633 33281 100 34
8643633 33281 0 34
8733084 33281 100 31
8517348 38323 100 33
8517348 38323 0 33
8544720 38323 100 31
8743224 38323 0 32
9345729 38323 0 34
9345729 38323 100 34
9325908 40503 0 31
9325908 40503 0 31
9325908 40503 100 31
9336147 40503 100 34
9336147 40503 0 34
9352576 40503 75 33
9352576 40503 25 33
9352537 40663 100 33
9352537 40663 0 33
9353437 40663 0 31
9353437 40663 100 31
9359019 40663 100 34
9359019 40663 0 34
;run;

proc sort data=veh;by key_id kroki;

data veh_prep;
set veh;
by key_id kroki;
VehNo+1;
if first.kroki then vehno = 1;
run;

proc sql;
create table max_veh as
select kroki,key_id,Percentage,max(VehNo) as VehNo
from veh_prep
group by kroki
order by key_id,kroki;

data par;
infile datalines dlm = " ";
input Kroki Key_ID Age PartyType $ HealthStatus $ BranchId;
datalines;
8505804 36412 0 Driver intact 31
8505804 36412 0 Driver intact 31
8530886 36412 0 Driver intact 34
8581794 36412 0 Driver intact 32
8581794 36412 0 Passenger Death in hospital 32
8581794 36412 0 Passenger Death in hospital 32
8525667 38802 0 Driver Death in hospital 33
8528435 38802 0 Driver intact 31
8528435 38802 0 Driver intact 31
8536144 38802 0 Driver intact 32
8536144 38802 0 Driver intact 32
9324143 38802 0 Driver intact 34
9324143 38802 0 Driver intact 34
8525667 38802 0 Passenger Death in hospital 33
8503111 40400 0 Driver intact 33
8503111 40400 0 Driver intact 33
9344281 40400 0 Driver intact 34
9344281 40400 0 Driver intact 34
9357307 40400 24 Driver Death at the site 31
9357307 40400 39 Driver intact 31
9357307 40400 16 Passenger Death at the site 31
8538023 32675 0 Driver Other 34
8595057 32675 0 Driver intact 32
8595057 32675 0 Driver intact 32
8538023 32675 0 Runaway driver Death in hospital 34
8449397 33281 0 Driver intact 32
8449397 33281 0 Driver intact 32
8643633 33281 0 Driver intact 34
8643633 33281 0 Driver intact 34
8733084 33281 35 Driver Death at the site 31
8733084 33281 30 Passenger Taken to hospital 31
8733084 33281 0 Passenger Taken to hospital 31
8733084 33281 0 Passenger Taken to hospital 31
8517348 38323 0 Driver intact 33
8517348 38323 0 Driver intact 33
8544720 38323 54 Driver Death at the site 31
8544720 38323 0 Driver Other 31
8743224 38323 0 Driver intact 32
9345729 38323 0 Driver intact 34
9325908 40503 0 Driver intact 31
9325908 40503 0 Driver intact 31
9325908 40503 0 Driver intact 31
9336147 40503 0 Driver intact 34
9336147 40503 0 Driver intact 34
9352576 40503 0 Driver Taken to hospital 33
9352576 40503 0 Driver intact 33
9352537 40663 0 Driver intact 33
9352537 40663 0 Driver intact 33
9353437 40663 0 Driver Treated on-site 31
9353437 40663 0 Driver intact 31
9359019 40663 0 Driver intact 34
9359019 40663 0 Driver intact 34
;run;

proc sort data=par;by key_id kroki;

data want(keep=Kroki Key_ID BranchId Severity InjNo DeathsNo VehNo) ;
merge acc_prep(in=a)
max_veh(in=b where=(percentage=100))
par(in=c where=(PartyType='Driver'));
by key_id kroki;
if a;
run;

proc sql;
create table test as
select distinct *
from want
where key_id = 36412;

Contributor
Posts: 53

Re: count and merge

Posted in reply to Steelers_In_DC

dear @Steelers_In_DC

 

thanks for your valuable solution but the (want) data set has more obsevations than (acc) data set.

Valued Guide
Posts: 860

Re: count and merge

I was picking up duplicate records where branchid is missing in the PAR dataset.  Changing the last two steps will fix the issue:

 

data want(keep=Kroki Key_ID BranchId Severity InjNo DeathsNo VehNo) ;
merge acc_prep(in=a)
max_veh(in=b where=(percentage=100))
par(in=c drop= BranchId where=(PartyType='Driver'));
by key_id kroki;
if a;
run;

proc sql;
create table final as
select distinct *
from want
order by key_id,kroki;*
where key_id = 36412;

 

 

Contributor
Posts: 53

Re: count and merge

Posted in reply to Steelers_In_DC

dear @Steelers_In_DC

 

Why you used (keep), when i removed it, it gave diffeent result.

Contributor
Posts: 53

Re: count and merge

dear @Steelers_In_DC

 

also, it seems wrong when i want to add (total number of par) as (vehno) code, as following:


data par_prep;
set par;
by key_id kroki;
ParNo+1;
if first.kroki then par = 1;
run;
proc sql;
create table max_par as
select kroki,key_id,PartyType,max(parno) as Parno
from par_prep
group by kroki
order by key_id,kroki;

 

then add it to merge as following :

 

data want(keep=Kroki Key_ID BranchId Severity InjNo DeathsNo VehNo) ;
merge acc_prep(in=a)
max_veh(in=b where=(percentage=100))
max_par(in=b where=(PartyType='Driver'))
par(in=c drop= BranchId where=(PartyType='Driver'));
by key_id kroki;
if a;
run;
proc sql;
create table final as
select distinct *
from want
order by key_id,kroki;

 

 

Valued Guide
Posts: 860

Re: count and merge

Give me an example of a record that this produces incorrectly and tell me what the desired output is.  Otherwise I have no idea what the problem is.

Contributor
Posts: 53

Re: count and merge

Posted in reply to Steelers_In_DC

dear @Steelers_In_DC

 

please find an example attached;

 

data acc;
infile cards dlm=' ';
input Kroki Key_ID AccidentTime $ AccType $ AccPoint $ WLight $ Severity $ PublicDamage $ AccReasons $
AccidentAngle $ InjurMajor Deaths InjMinor BranchId;
cards;
8578421 34969 07/06/201301:58PM Bumpedamovingvehicle - - Damage - preoccupationonDriving - 0 0 0 34
8799176 34969 09/05/201308:21AM Bumpedamovingvehicle - - Damage - distance - 0 0 0 31
8732911 34969 05/04/201304:51AM Other Straight Clear Deaths Nothing speeding FacetoFace 1 1 0 32
;run;

data veh;
infile cards dlm=' ';
input Kroki Key_ID Percentage VehicleMake $ VehicleModel VehicleColor $ AccidentPlace $ BranchId;
cards;
8578421 34969 0 - 0 - Front 34
8578421 34969 100 - 0 - Other 34
8732911 34969 0 to 0 - Other 32
8732911 34969 100 mr 1992 - Other 32
8799176 34969 0 cr 2003 - Front 31
8799176 34969 100 - 0 - Front 31
;run;

data par;
infile cards dlm=' ';
input Kroki Key_ID Age PartyType $ HealthStatus $ Nationality $ BranchId;
cards;
8578421 34969 0 Driver intact Ym 34
8578421 34969 0 Driver Other SD 34
8732911 34969 0 Driver Takentohospital Ym 32
8732911 34969 0 Driver intact Pk 32
8732911 34969 0 Passenger Deathinhospital Pk 32
8799176 34969 0 Driver intact Other 31
8799176 34969 0 Driver intact SD 31
;run;

data want;
infile cards dlm=' ';
input Kroki Key_ID AccidentTime $ AccType $ AccPoint $ WLight $ Severity $ PublicDamage $ AccReasons $
AccidentAngle $ InjurMajor Deaths InjMinor BranchId Percentage VehicleMake $ VehicleModel VehicleColor $
AccidentPlace $ Age PartyType $ HealthStatus $ Nationality $;
cards;
8578421 34969 07/06/201301:58PM Bumpedamovingvehicle - - Damage - preoccupationonDriving - 0 0 0 34 100 - 0 - Other 0 Driver Other SD
8799176 34969 09/05/201308:21AM Bumpedamovingvehicle - - Damage - distance - 0 0 0 31 100 - 0 - Front 0 Driver intact SD
8732911 34969 05/04/201304:51AM Other Straight Clear Deaths Nothing speeding FacetoFace 1 1 0 32 100 mr 1992 - Other 0 Driver intact Pk
run;

Valued Guide
Posts: 860

Re: count and merge

In your original post you state:

 

what I want is to have one observation for each crash includes:

 

Kroki Key_ID BranchId Severity InjNo DeathsNo VehNo ;

 

What you have here is completely different.  You have to re-explain what it is you are trying to achieve.

Contributor
Posts: 53

Re: count and merge

Posted in reply to Steelers_In_DC

dear @Steelers_In_DC

 

you are right and i am sorry i did not iclude the VehNo, InjNo and ParNo;

Here I just include all other variables. from orignal post i got how to count (veh) and (Par) based on your code. but it was hard to merge with all variables:

 

what I want ( is to have one observation for each crash includes) all vaiables in additon to the total number of (InjNo), (veh) and (Par).

 

the one observation should includes all variables of (acc) in addition to (InjNo) which is the total of number of (InjurMajor+InjMinor) ,

all variables of (Veh) where percentage=100     in addition to (VehNo) which is the count of number of Veh in each crash,

all variables of (Par)  where  PartyType='Driver' in addition to (Par) which is the count of number Par in each crash.

 

so the output should look like this:

 

data want;
infile cards dlm=' ';
input Kroki Key_ID AccidentTime $ AccType $ AccPoint $ WLight $ Severity $ PublicDamage $ AccReasons $
AccidentAngle $ InjurMajor Deaths InjMinor BranchId Percentage VehicleMake $ VehicleModel VehicleColor $
AccidentPlace $ Age PartyType $ HealthStatus $ Nationality $ InjNo VehNo ParNo;
cards;
8578421 34969 07/06/201301:58PM Bumpedamovingvehicle - - Damage - preoccupationonDriving - 0 0 0 34 100 - 0 - Other 0 Driver Other SD 0 2 2
8799176 34969 09/05/201308:21AM Bumpedamovingvehicle - - Damage - distance - 0 0 0 31 100 - 0 - Front 0 Driver intact SD 0 2 2
8732911 34969 05/04/201304:51AM Other Straight Clear Deaths Nothing speeding FacetoFace 1 1 0 32 100 mr 1992 - Other 0 Driver intact Pk 1 2 3
run;

 

 

 

Valued Guide
Posts: 860

Re: count and merge

How are you selecting what record comes from the 'par' dataset.  You only have one nationality per kroki in your output, there are 2-3 in this dataset.  What is the criteria?

Contributor
Posts: 53

Re: count and merge

Posted in reply to Steelers_In_DC

daer @Steelers_In_DC

 

that is very good question:

 

the criteria is only (PartyType)in Par dataset.

 

let me give an example:

 

for the 3rd obs of Acc there are two observtions (the 3rd and the 4th) from Veh (meaning VehNo=2) where the (4th obs shows the percentage =100 which i am interested in) to find the info. from Par dataset, there are three observations (meaning ParNo=3) (3rd, 4th and 5th) the (3rd obs is belong to the 3rd obs from Veh) so the (4th and 5th is belong to the 4th obs from Veh so i will choose the 4th from Par becaese the PartyType ='Driver').

 

Thanks again

Valued Guide
Posts: 860

Re: count and merge

Kroki 8799176 has two entries in the Par dataset, both of them are driver.  How do you select between the two?

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 27 replies
  • 389 views
  • 1 like
  • 3 in conversation