Hello Everyone,
I have crash data in which I have information on what type of person is involved in the crash: Driver (D), Occupant (O), and Passenger (P). I also have the report number per crash and person Id per person which means if there are 5 people involved in a crash then all 5 will have the same report number but different person IDs. I also have an age variable in the data. I am trying to create sample data where drivers should be 18 years or younger and passengers should be more than one and less than 20 years.
Here are my codes:
/*D- Driver
P- Pedestrian
O- Occupant*/
proc freq data = young.person21;
table PERSON_TYPE;
run;
proc contents data= young.person21;
run;
/*Finding number of drivers >= 18 years and more than one Passengers under age 20 involved in a crash
257981 observations and 6 variables*/
data person21;
retain INJ_SEVER_CODE REPORT_NO VEHICLE_ID PERSON_ID age;
length INJ_SEVER_CODE $5. Report_no $10. age 8.;
format INJ_SEVER_CODE $5. Report_no $10. age 8.;
informat INJ_SEVER_CODE $5. Report_no $10. age 8.;
set young.person21;
keep INJ_SEVER_CODE REPORT_NO VEHICLE_ID PERSON_ID AGE person_type ACC_DATE LIC_NUM;
run;
data young_crash21;
length type $15.;
set person21;
if age <= 18 and person_type = 'D' then Type= "Young Driver";
if age <= 20 and person_type = 'O' then Type= "Passenger";
if Type in ('Young Driver' ,'Passenger');
run;
/*30241 observations and 7 variable*/
data young_crash21_1;
set person21;
if age <= 18 and person_type = 'D' then Type1= "Young Driver";
if age <= 20 and person_type = 'O' then Type2= "Passenger";
if Type1= 'Young Driver' or Type2= 'Passenger';
run;
proc freq data = young_crash21_1;
table Lic_num;
run;
/*First method*/
proc sort data = young_crash21;
by report_no;
run;
data young_crash21_count;
set young_crash21;
count+1;
by report_no;
if first.report_no then count=1;
run;
/*Second method*/
proc sort data= young_crash21 out= young_crash21_count1;
by type report_no;
run;
data young_crash21_count2;;
set young_crash21_count1;
count + 1;
by type report_no;
if first.type or first.report_no then count = 1;
run;
I am stuck here. I don't know how to capture only that crash where the driver is less than and equal to 18 years. That particular crash has more than one passenger who is less than 20 years old. In the final data by using my codes I do see a few cases where the driver is older than 18 years and I don't know how to get rid of it.
Please help
The variables names don't match and the coding won't match for comparisons, eg type = "D" vs person_type="D" and/or person_type = "Driver" all require the appropriate data values and variable names. So it does matter.
Based on the last input data this works as expected:
data test;
input Report_no$ Vehicle_id Person_id age Person_type$;
datalines;
123456 101075 1541 22 D
123456 101076 1542 40 D
123456 101076 1543 16 O
123456 101076 1544 54 O
A05656 101077 1545 25 O
A07651 100918 1546 24 D
A07651 100919 1547 18 O
A07651 100919 1548 17 O
A0074 8840 1336 18 D
A0085 8853 1337 21 D
A0085 8853 1338 3 O
A075 1009 528 17 D
A075 1009 529 19 O
A075 1009 530 18 O
;
run;
data young_crash21_new;
set test;
if person_type='D' and age<=21 then flag1=1; else flag1=0;
if person_type='O' and age<=20 then flag2=1; else flag2=0;
run;
proc sql;
create table want as select * from young_crash21_new
group by report_no
having max(flag1)=1 and max(flag2)=1;
quit;
I am trying to create sample data where drivers should be 18 years or younger and passengers should be more than one and less than 20 years.
UNTESTED CODE (because no data has been provided)
data young_crash21_new;
set young_crash21;
if type='D' and age<=21 then flag1=1; else flag1=0;
if type='O' and age<=20 then flag2=1; else flag2=0;
run;
proc sql;
create table want as select * from young_crash21_new
group by report_no
having max(flag1)=1 and max(flag2)=1;
quit;
Thank you for the response. Let me try these codes thank you
I don't see an error. 🤷♂️
It is not an error but a note and it is giving 0 observation in the final data set (want)
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.WANT created, with 0 rows and 11 columns.
Since you haven't shared any data, maybe that's the correct answer.
If you can share a portion of your actual data, please do so. If you can't share it, please make up some data.
Data MUST be presented as working SAS data step code, which you can type in yourself or by following these instructions. Do not provide data other ways, I will not be able to use it.
Will this data work?
No, it does not work. Repeating my earlier request:
Data MUST be presented as working SAS data step code, which you can type in yourself or by following these instructions. Do not provide data other ways, I will not be able to use it.
This code should be included in your reply by copying as text and then pasting it into the window that appears when you click on the "little running man" icon.
data test;
input Report_no$ Vehicle_id Person_id age Person_type$;
datalines;
123456 101075 1541 22 Driver
123456 101076 1542 40 Driver
123456 101076 1543 16 Passenger
123456 101076 1544 54 Passenger
A05656 101077 1545 25 Passenger
A07651 100918 1546 24 Driver
A07651 100919 1547 18 Passenger
A07651 100919 1548 17 Passenger
A0074 8840 1336 18 Driver
A0085 8853 1337 21 Driver
A0085 8853 1338 3 Passenger
A075 1009 528 17 Driver
A075 1009 529 19 Passenger
A075 1009 530 18 Passenger
;
run;
I want you to do the programming that you can do, and I'll do the part that you are having trouble with. We have code where the variable PERSON_TYPE has values D or O, but that's not the data you provided. Please straighten that out, either by code or by adjusting the data.
D is Driver and O is Passenger. Does it matter?
data test;
input Report_no$ Vehicle_id Person_id age Person_type$;
datalines;
123456 101075 1541 22 D
123456 101076 1542 40 D
123456 101076 1543 16 O
123456 101076 1544 54 O
A05656 101077 1545 25 O
A07651 100918 1546 24 D
A07651 100919 1547 18 O
A07651 100919 1548 17 O
A0074 8840 1336 18 D
A0085 8853 1337 21 D
A0085 8853 1338 3 O
A075 1009 528 17 D
A075 1009 529 19 O
A075 1009 530 18 O
;
run;
The variables names don't match and the coding won't match for comparisons, eg type = "D" vs person_type="D" and/or person_type = "Driver" all require the appropriate data values and variable names. So it does matter.
Based on the last input data this works as expected:
data test;
input Report_no$ Vehicle_id Person_id age Person_type$;
datalines;
123456 101075 1541 22 D
123456 101076 1542 40 D
123456 101076 1543 16 O
123456 101076 1544 54 O
A05656 101077 1545 25 O
A07651 100918 1546 24 D
A07651 100919 1547 18 O
A07651 100919 1548 17 O
A0074 8840 1336 18 D
A0085 8853 1337 21 D
A0085 8853 1338 3 O
A075 1009 528 17 D
A075 1009 529 19 O
A075 1009 530 18 O
;
run;
data young_crash21_new;
set test;
if person_type='D' and age<=21 then flag1=1; else flag1=0;
if person_type='O' and age<=20 then flag2=1; else flag2=0;
run;
proc sql;
create table want as select * from young_crash21_new
group by report_no
having max(flag1)=1 and max(flag2)=1;
quit;
Thank you so much. This actually worked!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.