BookmarkSubscribeRSS Feed
Uddin
Fluorite | Level 6

Hi,

I have a data set regarding road fatalities. Here is an example of my data set:

crash_id, bus_involvement

1, yes

2, no

3, -9

4, no

5, no

6, yes

Could you please let me know what does it mean ‘-9’ under crash_id 3? Is it unknown or missed data? If it is then how I will fix the issue before calculation?

Looking for your kind response.

6 REPLIES 6
PaigeMiller
Diamond | Level 26

I would imagine you should ask the person who created the data. We don't know why -9 shows up there.

--
Paige Miller
Kurt_Bremser
Super User

Since you did not post your data as recommended (data step with datalines), we have no idea what it really contains.

It might even be that bus_involvement is numeric, with a format that displays 0 as no and 1 as yes. Since -9 is not covered by the format, it is displayed as is.

 

Anyway, the meaning of -9 should be defined in the documentation for the data you received. You got none? Request it.

Uddin
Fluorite | Level 6
Hi,
Thanks for your kind response. Here is the data step:
data road_crash;
input crash_id $ bus_involvement$;
datalines;
20205093 Yes
20205008 No
20205050 -9
20201127 No
20202001 No
20207010 Yes
;
run;
According to data sets, bus_involvement is string variable.

Looking for your kind feedbacks.
mkeintz
PROC Star

It appears the entry of a -9 for bus involvement was a decision made by those who provided the data to you.  We could guess what it means, but we can't read their minds.  Ask them.

 

As a technical note, you are reading the -9 as a string value, producing bus_involvement="-9".  A SAS procedure like proc freq will treat "-9" as a valid value.  For character variables, only a blank is treated as missing.  If you learn from your data source that the -9 is intended to represent a missing value, you could reassign "-9" to a blank.  But before doing so, I would check to see if there are any records in which bus_involvement is already blank.  If so, and if blank really means something different than -9, then you might want to leave bus_involvement unchanged and make a new variable bus_involvement2 from it, which you can modify without losing the original information.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Uddin
Fluorite | Level 6
Hi,
Thank you so much for your kind response. I checked the whole dataset, and couldn't find any blank under bus_involvement variable. Then do you think I can make it missing or blank? Looking for your kind response. Kind Regards
PaigeMiller
Diamond | Level 26

@Uddin wrote:
Hi,
Thank you so much for your kind response. I checked the whole dataset, and couldn't find any blank under bus_involvement variable. Then do you think I can make it missing or blank? Looking for your kind response. Kind Regards

What is the benefit of converting -9 to a blank? I see none.

 

But you've missed the point. Until you know what -9 means, converting it to something else may not be a good thing to do. Please go find out what -9 means.

--
Paige Miller