Solved: Re: How can i get mean for char type attribute? (During data pre-proce...

eunbi · Posted 02-07-2019 02:20 AM

Hi everyone,

I am beginner and a person who are newly learning SAS.

I am trying to data pre-processing and have question. I can calculate mean for numeric type of attribute in SAS well.

Here have question! In case of char type of attribute, how can we get mean?

(Logically thinking, chart type attribute cannot calculate mean I think. In this case, to handle missing value, Can i use frequency result?)

For example, below is frequency result for job. I think NA is missing value here and want to replace to handle it. I can find the most high column of result : management. And I thought, I can replace the NA result to management. Is it okay? Or Is there any solution can you advice me?? Appreciate so much in advanced

Kurt_Bremser · Posted 02-07-2019 02:53 AM

Missing values are shown as an addendum to the table:

data class;
set sashelp.class;
if name = 'Alfred' then call missing(sex);
run;

proc freq data=class;
tables sex;
run;

Result:

                                 Kumuliert     Kumuliert
Sex    Häufigkeit    Prozent     Häufigkeit      Prozent
--------------------------------------------------------
F              9      50.00             9        50.00  
M              9      50.00            18       100.00  

                 Frequency Missing = 1

So it is obvious that 'NA' is a non-missing value in your dataset.

Would you want to add the count for 'NA' to the highest frequency result?

Create an intermediary table, and manipulate that:

data class;
set sashelp.class;
if name = 'Alfred' then sex = 'X';
run;

proc freq data=class noprint;
tables sex/out=result;
run;

data want;
keepmax = 0;
do until (eof1); * retrieve the value for 'X' and determine the max observation;
  set result end=eof1;
  if sex = 'X' then add = count;
  if count > keepmax
  then do;
    keepmax = count;
    keepval = sex;
  end;
end;
do until (eof2); * modify the table;
  set result end=eof2;
  if sex ne 'X' then do; * this prevents X from going to the output;
    if sex = keepval then count + add;
    output;
  end;
end;
drop keepmax keepval add;
run;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 02-07-2019 02:53 AM

Missing values are shown as an addendum to the table:

data class;
set sashelp.class;
if name = 'Alfred' then call missing(sex);
run;

proc freq data=class;
tables sex;
run;

Result:

                                 Kumuliert     Kumuliert
Sex    Häufigkeit    Prozent     Häufigkeit      Prozent
--------------------------------------------------------
F              9      50.00             9        50.00  
M              9      50.00            18       100.00  

                 Frequency Missing = 1

So it is obvious that 'NA' is a non-missing value in your dataset.

Would you want to add the count for 'NA' to the highest frequency result?

Create an intermediary table, and manipulate that:

data class;
set sashelp.class;
if name = 'Alfred' then sex = 'X';
run;

proc freq data=class noprint;
tables sex/out=result;
run;

data want;
keepmax = 0;
do until (eof1); * retrieve the value for 'X' and determine the max observation;
  set result end=eof1;
  if sex = 'X' then add = count;
  if count > keepmax
  then do;
    keepmax = count;
    keepval = sex;
  end;
end;
do until (eof2); * modify the table;
  set result end=eof2;
  if sex ne 'X' then do; * this prevents X from going to the output;
    if sex = keepval then count + add;
    output;
  end;
end;
drop keepmax keepval add;
run;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

eunbi · Posted 02-07-2019 11:55 PM

Hi Sir!

Thank you for the reply so much. So you mean that NA is not missing value?

You guided me "Missing values are shown as an addendum to the table" here what exactly mean "addendum" ?

Unfortunately i couldn't really get your guidance. ㅠㅠ sorry

Reeza · Posted 02-08-2019 12:02 AM

NA is typically Not Applicable. Whether that's missing in your context is not something we can possibly know.

Kurt_Bremser · Posted 02-08-2019 01:46 AM

@eunbi wrote:

Hi Sir!

Thank you for the reply so much. So you mean that NA is not missing value?

You guided me "Missing values are shown as an addendum to the table" here what exactly mean "addendum" ?

Unfortunately i couldn't really get your guidance. ㅠㅠ sorry

In my post, I artificially created a missing value, and then ran proc freq over the resulting dataset. That proc freq created this line in the output:

Frequency Missing = 1

Since you do not have such a line in your output, it is clear that the 'NA' are actual values (and not just created by a special format for missing), and SAS will not treat them as missing. If NA stands for "not applicable", you should convert these values to missing (empty string) when importing the data into SAS. SAS usually excludes missing values automatically from calculations where such makes sense.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

eunbi · Posted 02-08-2019 02:28 AM

Oh I finally understand what you explained to me. Thank you so much Sir!

I think below article would be helpful who have similar curiosity too.

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a002316433.htm

By the way can i modify the table with below codes?

proc iml;
edit WORK.CARINSURANCE_TRAIN;
read all var {job} where (job = "NA");
job = "management";
replace all var {job} where (job = "NA");
close WORK.CARINSURANCE_TRAIN;

Kurt_Bremser · Posted 02-08-2019 02:37 AM

Do such a replace in a data step:

data work.carinsurance_train;
set work.carinsurance_train;
if job = 'NA' then job = 'management';
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

eunbi · Posted 02-08-2019 02:40 AM

Thank you for sharing your knowledge with me!

Reeza · Posted 02-07-2019 01:50 PM

I think you’re trying to impute missing? And replacing it with management is not a good way to do that.

First, why are they empty, or is NA a valid value?

Second, how will this variable be used?

Third, if it is missing, 19 is a smalll number, does it make sense to exclude these obs?

Lastly, if not, the you can look at imputation methods. Or basically try to predict what that value would be.

@eunbi wrote:

Hi everyone,

I am beginner and a person who are newly learning SAS.

I am trying to data pre-processing and have question. I can calculate mean for numeric type of attribute in SAS well.

Here have question! In case of char type of attribute, how can we get mean?

(Logically thinking, chart type attribute cannot calculate mean I think. In this case, to handle missing value, Can i use frequency result?)

For example, below is frequency result for job. I think NA is missing value here and want to replace to handle it. I can find the most high column of result : management. And I thought, I can replace the NA result to management. Is it okay? Or Is there any solution can you advice me?? Appreciate so much in advanced

eunbi · Posted 02-08-2019 12:01 AM

Hi Sir!

Thank you for reply! Below is my response for your questions.

First, why are they empty, or is NA a valid value?

> NA is not valid value. I am trying to learn data pre-processing with given data set now.

Second, how will this variable be used?

> Currently i have two data sets. One is train and the other is test. Once i done data pre-processing. I will try to estimate car insurance marking result based on the train data set.

Third, if it is missing, 19 is a smalll number, does it make sense to exclude these obs?

> Can consider of course as one of method of data cleaning. However i hope to correct inconsistencies in data.

Lastly, if not, the you can look at imputation methods. Or basically try to predict what that value would be.

> Based on the train data set i will predict marking result with test data set.

Reeza · Posted 02-08-2019 12:04 AM

Imputation can also be a prediction step. So first predict the category and then predict the outcome. In fact, you can try several different methods and see how they affect your output, though I suspect with 19 you're not going to see much differences.

eunbi · Posted 02-08-2019 12:50 AM

Thank you for reply again!

Actually I cannot use method which is removing outliers or invalid value from my data set.

Instead of this method, I have to use others method for data-reprocessing, thats why i try ting to replace the value at the begin.

Since i cannot remove the value, can you give me any good idea to handle this?

Thank you !

How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Re: How can i get mean for char type attribute? (During data pre-processing)

Registration is open