BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eunbi
Obsidian | Level 7

Hi everyone, 

I am beginner and a person who are newly learning SAS.

I am trying to data pre-processing and have question. I can calculate mean for numeric type of attribute in SAS well.

Here have question! In case of char type of attribute, how can we get mean?

(Logically thinking, chart type attribute cannot calculate mean I think. In this case, to handle missing value, Can i use frequency result?)

 

For example, below is frequency result for job. I think NA is missing value here and want to replace to handle it. I can find the most high column of result : management. And I thought, I can replace the NA result to management. Is it okay? Or Is there any solution can you advice me?? Appreciate so much in advancedsas question.png

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Missing values are shown as an addendum to the table:

data class;
set sashelp.class;
if name = 'Alfred' then call missing(sex);
run;

proc freq data=class;
tables sex;
run;

Result:

                                 Kumuliert     Kumuliert
Sex    Häufigkeit    Prozent     Häufigkeit      Prozent
--------------------------------------------------------
F              9      50.00             9        50.00  
M              9      50.00            18       100.00  

                 Frequency Missing = 1

So it is obvious that 'NA' is a non-missing value in your dataset.

 

Would you want to add the count for 'NA' to the highest frequency result?

Create an intermediary table, and manipulate that:

data class;
set sashelp.class;
if name = 'Alfred' then sex = 'X';
run;

proc freq data=class noprint;
tables sex/out=result;
run;

data want;
keepmax = 0;
do until (eof1); * retrieve the value for 'X' and determine the max observation;
  set result end=eof1;
  if sex = 'X' then add = count;
  if count > keepmax
  then do;
    keepmax = count;
    keepval = sex;
  end;
end;
do until (eof2); * modify the table;
  set result end=eof2;
  if sex ne 'X' then do; * this prevents X from going to the output;
    if sex = keepval then count + add;
    output;
  end;
end;
drop keepmax keepval add;
run;
run;

View solution in original post

11 REPLIES 11
Kurt_Bremser
Super User

Missing values are shown as an addendum to the table:

data class;
set sashelp.class;
if name = 'Alfred' then call missing(sex);
run;

proc freq data=class;
tables sex;
run;

Result:

                                 Kumuliert     Kumuliert
Sex    Häufigkeit    Prozent     Häufigkeit      Prozent
--------------------------------------------------------
F              9      50.00             9        50.00  
M              9      50.00            18       100.00  

                 Frequency Missing = 1

So it is obvious that 'NA' is a non-missing value in your dataset.

 

Would you want to add the count for 'NA' to the highest frequency result?

Create an intermediary table, and manipulate that:

data class;
set sashelp.class;
if name = 'Alfred' then sex = 'X';
run;

proc freq data=class noprint;
tables sex/out=result;
run;

data want;
keepmax = 0;
do until (eof1); * retrieve the value for 'X' and determine the max observation;
  set result end=eof1;
  if sex = 'X' then add = count;
  if count > keepmax
  then do;
    keepmax = count;
    keepval = sex;
  end;
end;
do until (eof2); * modify the table;
  set result end=eof2;
  if sex ne 'X' then do; * this prevents X from going to the output;
    if sex = keepval then count + add;
    output;
  end;
end;
drop keepmax keepval add;
run;
run;
eunbi
Obsidian | Level 7

Hi Sir!

Thank you for the reply so much. So you mean that NA is not missing value?

You guided me "Missing values are shown as an addendum to the table" here what exactly mean "addendum" ?

Unfortunately i couldn't really get your guidance. ㅠㅠ sorry

Reeza
Super User
NA is typically Not Applicable. Whether that's missing in your context is not something we can possibly know.
Kurt_Bremser
Super User

@eunbi wrote:

Hi Sir!

Thank you for the reply so much. So you mean that NA is not missing value?

You guided me "Missing values are shown as an addendum to the table" here what exactly mean "addendum" ?

Unfortunately i couldn't really get your guidance. ㅠㅠ sorry


In my post, I artificially created a missing value, and then ran proc freq over the resulting dataset. That proc freq created this line in the output:

Frequency Missing = 1

Since you do not have such a line in your output, it is clear that the 'NA' are actual values (and not just created by a special format for missing), and SAS will not treat them as missing. If NA stands for "not applicable", you should convert these values to missing (empty string) when importing the data into SAS. SAS usually excludes missing values automatically from calculations where such makes sense.

eunbi
Obsidian | Level 7

Oh I finally understand what you explained to me. Thank you so much Sir!

I think below article would be helpful who have similar curiosity too.

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a002316433.htm

 

By the way can i modify the table with below codes?

 

proc iml;
edit WORK.CARINSURANCE_TRAIN;
read all var {job} where (job = "NA");
job = "management";
replace all var {job} where (job = "NA");
close WORK.CARINSURANCE_TRAIN;
eunbi
Obsidian | Level 7

Thank you for sharing your knowledge with me!

Reeza
Super User

I think you’re trying to impute missing? And replacing it with management is not a good way to do that. 

 

First, why are they empty, or is NA a valid value?

Second, how will this variable be used?

Third, if it is missing, 19 is a smalll number, does it make sense to exclude these obs?

Lastly, if not,  the you can look at imputation methods. Or basically try to predict what that value would be. 

 


@eunbi wrote:

Hi everyone, 

I am beginner and a person who are newly learning SAS.

I am trying to data pre-processing and have question. I can calculate mean for numeric type of attribute in SAS well.

Here have question! In case of char type of attribute, how can we get mean?

(Logically thinking, chart type attribute cannot calculate mean I think. In this case, to handle missing value, Can i use frequency result?)

 

For example, below is frequency result for job. I think NA is missing value here and want to replace to handle it. I can find the most high column of result : management. And I thought, I can replace the NA result to management. Is it okay? Or Is there any solution can you advice me?? Appreciate so much in advancedsas question.png


 

eunbi
Obsidian | Level 7

Hi Sir!

 

Thank you for reply! Below is my response for your questions.

 

First, why are they empty, or is NA a valid value?

> NA is not valid value. I am trying to learn data pre-processing with given data set now.

Second, how will this variable be used?

> Currently i have two data sets. One is train and the other is test. Once i done data pre-processing. I will try to estimate car insurance marking result based on the train data set. 

Third, if it is missing, 19 is a smalll number, does it make sense to exclude these obs?

> Can consider of course as one of method of data cleaning. However i hope to correct inconsistencies in data.

Lastly, if not,  the you can look at imputation methods. Or basically try to predict what that value would be. 

> Based on the train data set i will predict marking result with test data set.

Reeza
Super User
Imputation can also be a prediction step. So first predict the category and then predict the outcome. In fact, you can try several different methods and see how they affect your output, though I suspect with 19 you're not going to see much differences.
eunbi
Obsidian | Level 7

Thank you for reply again!

Actually I cannot use method which is removing outliers  or invalid value from my data set.

Instead of this method, I have to use others method for data-reprocessing, thats why i try ting to replace the value at the begin.

Since i cannot remove the value, can you give me any good idea to handle this?

Thank you !

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2007 views
  • 5 likes
  • 3 in conversation