BookmarkSubscribeRSS Feed
Uddin
Fluorite | Level 6

Hi,

I have a data set regarding road fatalities. Here is an example of my data set:

crash_id, bus_involvement

1, yes

2, no

3, -9

4, no

5, no

6, yes

Could you please let me know what does it mean ‘-9’ under crash_id 3? Is it unknown or missed data? If it is then how I will fix the issue before calculation?

Looking for your kind response.

6 REPLIES 6
PaigeMiller
Diamond | Level 26

I would imagine you should ask the person who created the data. We don't know why -9 shows up there.

--
Paige Miller
Kurt_Bremser
Super User

Since you did not post your data as recommended (data step with datalines), we have no idea what it really contains.

It might even be that bus_involvement is numeric, with a format that displays 0 as no and 1 as yes. Since -9 is not covered by the format, it is displayed as is.

 

Anyway, the meaning of -9 should be defined in the documentation for the data you received. You got none? Request it.

Uddin
Fluorite | Level 6
Hi,
Thanks for your kind response. Here is the data step:
data road_crash;
input crash_id $ bus_involvement$;
datalines;
20205093 Yes
20205008 No
20205050 -9
20201127 No
20202001 No
20207010 Yes
;
run;
According to data sets, bus_involvement is string variable.

Looking for your kind feedbacks.
mkeintz
PROC Star

It appears the entry of a -9 for bus involvement was a decision made by those who provided the data to you.  We could guess what it means, but we can't read their minds.  Ask them.

 

As a technical note, you are reading the -9 as a string value, producing bus_involvement="-9".  A SAS procedure like proc freq will treat "-9" as a valid value.  For character variables, only a blank is treated as missing.  If you learn from your data source that the -9 is intended to represent a missing value, you could reassign "-9" to a blank.  But before doing so, I would check to see if there are any records in which bus_involvement is already blank.  If so, and if blank really means something different than -9, then you might want to leave bus_involvement unchanged and make a new variable bus_involvement2 from it, which you can modify without losing the original information.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Uddin
Fluorite | Level 6
Hi,
Thank you so much for your kind response. I checked the whole dataset, and couldn't find any blank under bus_involvement variable. Then do you think I can make it missing or blank? Looking for your kind response. Kind Regards
PaigeMiller
Diamond | Level 26

@Uddin wrote:
Hi,
Thank you so much for your kind response. I checked the whole dataset, and couldn't find any blank under bus_involvement variable. Then do you think I can make it missing or blank? Looking for your kind response. Kind Regards

What is the benefit of converting -9 to a blank? I see none.

 

But you've missed the point. Until you know what -9 means, converting it to something else may not be a good thing to do. Please go find out what -9 means.

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 462 views
  • 1 like
  • 4 in conversation