BookmarkSubscribeRSS Feed
eunbi
Obsidian | Level 7

Hi all,

I am a person who are newly learning SAS. Now i am trying to proceed data pre-processing.

While this pre-processing step, i got some questions.

In case of char type of attributes, I just replaced NA(noisy) value to most frequently displayed value.

ex : i replaced the NA to management. 

sas question.png

 

However when i check the other attribute 'Outcome' the volume is too huge so i am afraid to replace it as failure.

Is there any method can replace or handle this noisy data? (I cannot use remove method at all.)

Thank you in advanced!.

1.png

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

There are many theories out there on missing data population, papers been written about it etc.  It isn't however a good fit for a Q&A board.  You will need to go away, analyse the data, discuss with consumers what they want the data for, how its to be used etc. and come up with a plan to handle this missing data.  

eunbi
Obsidian | Level 7

Hi Sir,

Thank you for reply 🙂 I fully understand what you mention below. However my data set and data pre-processing is not heading to purpose commercially. (It's my personal self study, so i couldn't get the business point and target of business target of analysis.)

 

I am trying to explore each attributes and all outliers and missing values and consider how i can handle it now.

 

If my question is not proper this board, sorry.

 

By the way thank you anyway!

ballardw
Super User

I would start this by going to the data source and getting clarification exactly what NA means for each variable and how it gets assigned.

 

In a very generic "outcome" sense I can see NA being something similar to "the inputs were not valid or out of the intended range so outcome would not be valid/useable/sensible". If that were the case it may be that for other processing those NA should actually be missing.

 

 

eunbi
Obsidian | Level 7

Hi Sir,

Thank you for sharing your good idea! I think i need to compare and review the relationship between each attributes hold NA values.

 

ErikLund_Jensen
Rhodochrosite | Level 12

N.A. can be an abbreviation of not available, which should be treated as missing, but it also can mean not applicable, as often used in surveys where - as an example - questions related to children are coded n.a. if a respondent does not have any children. Statistical software does not normally work with more than one missing value, but if a special not applicable-code exists in data it is very useful in a filter to get a relevant population for a given analysis. 

 

By the way - do you know "The little SAS book - a primer"? - It you can get hold of a copy, it gives you everything from a basic introduction to SAS principles and coding techniques and up to sophisticated macro coding and analysis. It is now in 5. edition and not so little anymore, and it certainly comes at a price, but it is highly recommended!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1767 views
  • 1 like
  • 4 in conversation