I am recoding numeric values to character values. I am wanting the values of 77 and 99 to be coded to missing so they are not included in the frequency tables.
value C5R
. = " "
1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"
2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"
3 = "GRANDPARENT"
4 = "OTHER FAMILY MEMBER/FRIEND"
77 = " "
99 = " "
I thought the above would do what I was wanting, but when I ran a proc freq that missing appeared in the table but the value was simply blank. I am not sure if this is possible to do.
any help is appreciated.
RELATIONSHIP OF RESPONDENT TO TEEN (RECODE)
Cumulative Cumulative
C5R Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN 15859477 77.13 15859477 77.13
FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN 3500044 17.02 19359521 94.15
GRANDPARENT 821907.6 4.00 20181429 98.15
OTHER FAMILY MEMBER/FRIEND 376138.2 1.83 20557567 99.98
4608.209 0.02 20562175 100.00 ******I want this value to appear as missing below and not be included in the frequency count or percent calculations.
You will probably get better results by recoding the actual data values. You can use special missing values to distinguish amoung 77,99 and actual missing values.
Then those values will always be considered missing by PROC FREQ (and other analysis procedures).
Here is one way to convert using an INFORMAT and two different FORMATs. One that displays a decode for all of the values and other that maps the special values to blank.
proc format ;
value C5R /* Normal reporting value decodes */
1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"
2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"
3 = "GRANDPARENT"
4 = "OTHER FAMILY MEMBER/FRIEND"
77,.A = "Not Applicable"
99,.B = "Refused to answer"
. = "Not specified"
;
value C5M /* Collapse all missing values */
.,.A,.B,77,99 = ' '
other = [C5R50.]
;
invalue c5x /* Recoding Informat */
77=.A
99=.B
;
run;
data want ;
do c5=1 to 4,77,99,.;
c5x = input(put(c5,2.),c5x.);
output;
end;
run;
Try using PROC FREQ will all combinations of the original variable (C5) and the recoded variable (C5X), missing option, and the C5R and C5M formats.
is this helpful?
proc format;
value C5R
. = " "
1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"
2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"
3 = "GRANDPARENT"
4 = "OTHER FAMILY MEMBER/FRIEND"
77 = " "
99 = " ";
run;
data have;
input v;
format v c5r.;
cards;
1
2
2
3
4
77
99
.
;
proc freq data=have;
tables v;
run;
You are almost certainly applying the format in the FREQ step. In FREQ the missing values are eliminated first, then the format is applied. Using the format in an earlier step (Linlin uses it in the DATA step) causes this timing to change.
hi ... the only way you get those missing values in your table is by adding the MISSING option in PROC FREQ (no matter where you use the FORMAT statement ... data step or PROC) ... using Linlin's data ...
proc freq data=have;
table v / missing;
format v c5r.;
run;
did you use the MISSING option .. .if so, that's how the missing values end up in you table
if you did, what I don't understand is how the missing values ended up listed LAST in your table ... if MISSING data are present in a table in PROC FREQ they are always listed first (for either numeric or character data) no matter what you do (use a format, use an ORDER= option, use a WEIGHT option which you must have to get the non-integer frequencies in your table, whatever)
so, the table in your posting is a "mystery"
also, a suggestion ... you can group the values that are assigned to missing with the format ...
proc format;
value C5R
. , 77, 99 = " "
1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"
2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"
3 = "GRANDPARENT"
4 = "OTHER FAMILY MEMBER/FRIEND"
;
run;
Here's a simple example to get what the OP has. It doesn't remove the missing either...not sure why.
On a Windows XP, SAS 9.2.3
Regardless of where the format is applied the results are the same.
data class;
set sashelp.class;
run;
proc format;
value age
11='Pre Teen'
12='Pre Teen'
13='Teen'
14='Teen'
15='Teen'
16=" ";
run;
proc freq data=class;
table age;
format age age.;
run;
Cumulative Cumulative
Age Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Pre Teen 7 36.84 7 36.84
Teen 11 57.89 18 94.74
1 5.26 19 100.00
Hi Reeza,
In order to get the missing counts, there must be at least one missing value in the dataset.
data class;
set sashelp.class;
if name='Alice' then age=.;
run;
proc format;
value age
11='Pre Teen'
12='Pre Teen'
13='Teen'
14='Teen'
15='Teen'
.,16=' ';
run;
proc freq data=class;
table age;
format age age.;
run;
Cumulative Cumulative
Age Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Pre Teen 7 41.18 7 41.18
Teen 10 58.82 17 100.00
Frequency Missing = 2
Linlin
Hi ... thanks, that explains the OP table ... must not have any real missing values.
As for your "not sure why" ... it doesn't get rid of missing since there are no missing values. The table is based on the original values of the variables, not formatted values. The OP must not have had any real missing values ... only the 77 and 99 that had a blank format label. Since 77 and 99 are greater than 1,2,3,4 ... they ended up last.
However, if there are no actual missing values, you cannot create missing data with a format and all observations end up in the table. That BLANK value in the table using SASHELP.CLASS is just a mask for 16.
If there actually were missing values in the OP data, they'd end up grouped with 77 and 99 and they'd all be considered as MISSING and would not be in the table
That's fine, there's ways around it, just not the behaviour I would have expected by default. I do understand that SAS uses the underlying value rather than the actual formatted value.
If it was a space in a character value then it would be considered a missing observation.
If it is a space in the format of a numeric variable it IS considered a missing observation if there is other missing numeric observations, but if there are no missing observations it doesn't work.
I'd expect it to be treated the same whether or not there were other missing values. So in LinLin's example the table should have 1 missing and 1 blank not 2 missing.
Fareeza,
I've never confronted this before, but totally agree that it is not what one would or should expect. The value either is or isn't missing. I, personally, would consider this a bug that ought to be corrected. I'd be interested in hearing how anyone might consider this a feature.
hi ... when a missing value is grouped with non-mssing values via a format, they all end up as missing in PROC FREQ
proc format;
value x
1-3 = '<4'
other = '4+'
;
run;
data x;
input x @@;
datalines;
1 2 3 . 5 6 7 8
;
run;
proc freq data=x;
table x;
format x x.;
run;
Cumulative Cumulative
x Frequency Percent Frequency Percent
<4 3 100.00 3 100.00
Frequency Missing = 5
the notes I use for my intro SAS class ... http://www.albany.edu/~msz03/epi514/notes/fp125_140.pdf ... discuss this on pages 138 and 139
I've always called it the "bad apple rule" ... one missing value spoils all the other non-missing in the group
ps Art Carpenter has a nice write up on this ... http://www.sascommunity.org/wiki/Missing_Values_in_Classification_Variables
with a link ... http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_sec...
that states "When you use PROC FORMAT to create a user-written format that combines missing and nonmissing values into one category, PROC FREQ treats the entire category of formatted values as missing."
Mike,
Again, while I wasn't familiar with the effect or the presentations (and they were nice indeed), I still agree with Fareeza that the effect SHOULD NOT be there!
That is what the documentation says, but it does that ONLY if there are missing values.
If you have a format that assumes this, but have no actual missing values then the results end up in your table, not in the missing which isn't what I'd expect according to the documentation.
You will probably get better results by recoding the actual data values. You can use special missing values to distinguish amoung 77,99 and actual missing values.
Then those values will always be considered missing by PROC FREQ (and other analysis procedures).
Here is one way to convert using an INFORMAT and two different FORMATs. One that displays a decode for all of the values and other that maps the special values to blank.
proc format ;
value C5R /* Normal reporting value decodes */
1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"
2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"
3 = "GRANDPARENT"
4 = "OTHER FAMILY MEMBER/FRIEND"
77,.A = "Not Applicable"
99,.B = "Refused to answer"
. = "Not specified"
;
value C5M /* Collapse all missing values */
.,.A,.B,77,99 = ' '
other = [C5R50.]
;
invalue c5x /* Recoding Informat */
77=.A
99=.B
;
run;
data want ;
do c5=1 to 4,77,99,.;
c5x = input(put(c5,2.),c5x.);
output;
end;
run;
Try using PROC FREQ will all combinations of the original variable (C5) and the recoded variable (C5X), missing option, and the C5R and C5M formats.
I apologize for such a delayed response. I went out of office for a time and am now back.
As I was reading through the response to my original post, I believe that according to what was written, if the variable I am trying to recode has no original missing or blank data, I cannot recode other values such as 77 or 99 to missing using a proc format?
thank you all for your responses, but if I am understanding what has been posted, there is not simple solution of the variable I am recoding has not orignial missing values.
If I am misinterpreting, please let me know. Thank you again.
HyunJee
Hi HyunJee,
What you interpreted is correct. In order to count 77, 99 as missing you have to have real missing values (.) in your dataset.
or you can pre-process you dataset:
data have ;
set have;
if v1 in (77,99) then v1 =.;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.