Hi everyone,
I am beginner and a person who are newly learning SAS.
I am trying to data pre-processing and have question. I can calculate mean for numeric type of attribute in SAS well.
Here have question! In case of char type of attribute, how can we get mean?
(Logically thinking, chart type attribute cannot calculate mean I think. In this case, to handle missing value, Can i use frequency result?)
For example, below is frequency result for job. I think NA is missing value here and want to replace to handle it. I can find the most high column of result : management. And I thought, I can replace the NA result to management. Is it okay? Or Is there any solution can you advice me?? Appreciate so much in advanced
Missing values are shown as an addendum to the table:
data class;
set sashelp.class;
if name = 'Alfred' then call missing(sex);
run;
proc freq data=class;
tables sex;
run;
Result:
Kumuliert Kumuliert Sex Häufigkeit Prozent Häufigkeit Prozent -------------------------------------------------------- F 9 50.00 9 50.00 M 9 50.00 18 100.00 Frequency Missing = 1
So it is obvious that 'NA' is a non-missing value in your dataset.
Would you want to add the count for 'NA' to the highest frequency result?
Create an intermediary table, and manipulate that:
data class;
set sashelp.class;
if name = 'Alfred' then sex = 'X';
run;
proc freq data=class noprint;
tables sex/out=result;
run;
data want;
keepmax = 0;
do until (eof1); * retrieve the value for 'X' and determine the max observation;
set result end=eof1;
if sex = 'X' then add = count;
if count > keepmax
then do;
keepmax = count;
keepval = sex;
end;
end;
do until (eof2); * modify the table;
set result end=eof2;
if sex ne 'X' then do; * this prevents X from going to the output;
if sex = keepval then count + add;
output;
end;
end;
drop keepmax keepval add;
run;
run;
Missing values are shown as an addendum to the table:
data class;
set sashelp.class;
if name = 'Alfred' then call missing(sex);
run;
proc freq data=class;
tables sex;
run;
Result:
Kumuliert Kumuliert Sex Häufigkeit Prozent Häufigkeit Prozent -------------------------------------------------------- F 9 50.00 9 50.00 M 9 50.00 18 100.00 Frequency Missing = 1
So it is obvious that 'NA' is a non-missing value in your dataset.
Would you want to add the count for 'NA' to the highest frequency result?
Create an intermediary table, and manipulate that:
data class;
set sashelp.class;
if name = 'Alfred' then sex = 'X';
run;
proc freq data=class noprint;
tables sex/out=result;
run;
data want;
keepmax = 0;
do until (eof1); * retrieve the value for 'X' and determine the max observation;
set result end=eof1;
if sex = 'X' then add = count;
if count > keepmax
then do;
keepmax = count;
keepval = sex;
end;
end;
do until (eof2); * modify the table;
set result end=eof2;
if sex ne 'X' then do; * this prevents X from going to the output;
if sex = keepval then count + add;
output;
end;
end;
drop keepmax keepval add;
run;
run;
Hi Sir!
Thank you for the reply so much. So you mean that NA is not missing value?
You guided me "Missing values are shown as an addendum to the table" here what exactly mean "addendum" ?
Unfortunately i couldn't really get your guidance. ㅠㅠ sorry
@eunbi wrote:
Hi Sir!
Thank you for the reply so much. So you mean that NA is not missing value?
You guided me "Missing values are shown as an addendum to the table" here what exactly mean "addendum" ?
Unfortunately i couldn't really get your guidance. ㅠㅠ sorry
In my post, I artificially created a missing value, and then ran proc freq over the resulting dataset. That proc freq created this line in the output:
Frequency Missing = 1
Since you do not have such a line in your output, it is clear that the 'NA' are actual values (and not just created by a special format for missing), and SAS will not treat them as missing. If NA stands for "not applicable", you should convert these values to missing (empty string) when importing the data into SAS. SAS usually excludes missing values automatically from calculations where such makes sense.
Oh I finally understand what you explained to me. Thank you so much Sir!
I think below article would be helpful who have similar curiosity too.
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a002316433.htm
By the way can i modify the table with below codes?
proc iml; edit WORK.CARINSURANCE_TRAIN; read all var {job} where (job = "NA"); job = "management"; replace all var {job} where (job = "NA"); close WORK.CARINSURANCE_TRAIN;
Do such a replace in a data step:
data work.carinsurance_train;
set work.carinsurance_train;
if job = 'NA' then job = 'management';
run;
Thank you for sharing your knowledge with me!
I think you’re trying to impute missing? And replacing it with management is not a good way to do that.
First, why are they empty, or is NA a valid value?
Second, how will this variable be used?
Third, if it is missing, 19 is a smalll number, does it make sense to exclude these obs?
Lastly, if not, the you can look at imputation methods. Or basically try to predict what that value would be.
@eunbi wrote:
Hi everyone,
I am beginner and a person who are newly learning SAS.
I am trying to data pre-processing and have question. I can calculate mean for numeric type of attribute in SAS well.
Here have question! In case of char type of attribute, how can we get mean?
(Logically thinking, chart type attribute cannot calculate mean I think. In this case, to handle missing value, Can i use frequency result?)
For example, below is frequency result for job. I think NA is missing value here and want to replace to handle it. I can find the most high column of result : management. And I thought, I can replace the NA result to management. Is it okay? Or Is there any solution can you advice me?? Appreciate so much in advanced
Hi Sir!
Thank you for reply! Below is my response for your questions.
First, why are they empty, or is NA a valid value?
> NA is not valid value. I am trying to learn data pre-processing with given data set now.
Second, how will this variable be used?
> Currently i have two data sets. One is train and the other is test. Once i done data pre-processing. I will try to estimate car insurance marking result based on the train data set.
Third, if it is missing, 19 is a smalll number, does it make sense to exclude these obs?
> Can consider of course as one of method of data cleaning. However i hope to correct inconsistencies in data.
Lastly, if not, the you can look at imputation methods. Or basically try to predict what that value would be.
> Based on the train data set i will predict marking result with test data set.
Thank you for reply again!
Actually I cannot use method which is removing outliers or invalid value from my data set.
Instead of this method, I have to use others method for data-reprocessing, thats why i try ting to replace the value at the begin.
Since i cannot remove the value, can you give me any good idea to handle this?
Thank you !
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.