Help using Base SAS procedures

Using Proc format to code multiple values to missing

Accepted Solution Solved
Reply
Contributor
Posts: 71
Accepted Solution

Using Proc format to code multiple values to missing

I am recoding numeric values to character values. I am wanting the values of 77 and 99 to be coded to missing so they are not included in the frequency tables.

value C5R

. = " "

1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"

2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"

3 = "GRANDPARENT"

4 = "OTHER FAMILY MEMBER/FRIEND"

77 = " "

99 = " "

I thought the above would do what I was wanting, but when I ran a proc freq that missing appeared in the table but the value was simply blank. I am not sure if this is possible to do.

any help is appreciated.

                RELATIONSHIP OF RESPONDENT TO TEEN (RECODE)

                                                                         Cumulative  Cumulative

                                               C5R                                                    Frequency   Percent   Frequency    Percent

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN        15859477     77.13    15859477      77.13

FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN               3500044     17.02    19359521      94.15

GRANDPARENT                                                                                 821907.6      4.00    20181429      98.15

OTHER FAMILY MEMBER/FRIEND                                                     376138.2      1.83    20557567      99.98

                                                                                                           4608.209      0.02    20562175     100.00   ******I want this value to appear as missing below and not be included in the frequency count or percent calculations.


Accepted Solutions
Solution
‎01-06-2012 08:22 AM
Super User
Super User
Posts: 7,041

Re: Using Proc format to code multiple values to missing

You will probably get better results by recoding the actual data values.  You can use special missing values to distinguish amoung 77,99 and actual missing values.

Then those values will always be considered missing by PROC FREQ (and other analysis procedures).

Here is one way to convert using an INFORMAT and two different FORMATs. One that displays a decode for all of the values and other that maps the special values to blank.

proc format ;

value C5R  /* Normal reporting value decodes */

1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"

2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"

3 = "GRANDPARENT"

4 = "OTHER FAMILY MEMBER/FRIEND"

77,.A = "Not Applicable"

99,.B = "Refused to answer"

.  = "Not specified"

;

value C5M /* Collapse all missing values */

.,.A,.B,77,99 = ' '

other = [C5R50.]

;

invalue c5x /* Recoding Informat */

  77=.A

  99=.B

;

run;

data want ;

  do c5=1 to 4,77,99,.;

    c5x = input(put(c5,2.),c5x.);

    output;

  end;

run;

Try using PROC FREQ will all combinations of the original variable (C5) and the recoded variable (C5X), missing option, and the C5R and C5M formats.

View solution in original post


All Replies
Super Contributor
Posts: 1,636

Using Proc format to code multiple values to missing

is this helpful?

proc format;

value C5R

. = " "

1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"

2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"

3 = "GRANDPARENT"

4 = "OTHER FAMILY MEMBER/FRIEND"

77 = " "

99 = " ";

run;

data have;

input v;

format v c5r.;

cards;

1

2

2

3

4

77

99

.

;

proc freq data=have;

  tables v;

  run;

Valued Guide
Posts: 634

Using Proc format to code multiple values to missing

You are almost certainly applying the format in the FREQ step.  In FREQ the missing values are eliminated first, then the format is applied.  Using the format in an earlier step (Linlin uses it in the DATA step) causes this timing to change. 

Valued Guide
Posts: 765

Re: Using Proc format to code multiple values to missing

hi ... the only way you get those missing values in your table is by adding the MISSING option in PROC FREQ (no matter where you use the FORMAT statement ... data step or PROC) ... using Linlin's data ...

proc freq data=have;

table v / missing;

format v c5r.;

run;

did you use the MISSING option .. .if so, that's how the missing values end up in you table

if you did, what I don't understand is how the missing values ended up listed LAST in your table ... if MISSING data are present in a table in PROC FREQ they are always listed first (for either numeric or character data) no matter what you do (use a format, use an ORDER= option, use a WEIGHT option which you must have to get the non-integer frequencies in your table, whatever)

so, the table in your posting is a "mystery"

also, a suggestion ... you can group the values that are assigned to missing with the format ...

proc format;

value C5R

. , 77, 99  = " "

1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"

2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"

3 = "GRANDPARENT"

4 = "OTHER FAMILY MEMBER/FRIEND"

;

run;

Super User
Posts: 19,789

Re: Using Proc format to code multiple values to missing

Here's a simple example to get what the OP has. It doesn't remove the missing either...not sure why.

On a Windows XP, SAS 9.2.3

Regardless of where the format is applied the results are the same.

data class;

    set sashelp.class;

run;

proc format;

    value age

    11='Pre Teen'

    12='Pre Teen'

    13='Teen'

    14='Teen'

    15='Teen'

    16=" ";

run;

proc freq data=class;

    table age;

    format age age.;

run;

                                                                Cumulative    Cumulative

                                Age    Frequency     Percent     Frequency      Percent

                           ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                           Pre Teen           7       36.84             7        36.84

                           Teen              11       57.89            18        94.74

                                              1        5.26            19       100.00

Super Contributor
Posts: 1,636

Re: Using Proc format to code multiple values to missing

Hi Reeza,

In order to get the missing counts, there must be at least one missing value in the dataset.

data class;

       set sashelp.class;

         if name='Alice' then age=.;

run;

proc format;

    value age

    11='Pre Teen'

    12='Pre Teen'

    13='Teen'

    14='Teen'

    15='Teen'

    .,16=' ';

    

run;

proc freq data=class;

    table age;

      format age age.;

run;

                                     Cumulative    Cumulative

                       Age    Frequency     Percent     Frequency      Percent

                  ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                  Pre Teen           7       41.18             7        41.18

                  Teen              10       58.82            17       100.00

                                      Frequency Missing = 2

Linlin

Valued Guide
Posts: 765

Using Proc format to code multiple values to missing

Hi ... thanks, that explains the OP table ... must not have any real missing values.

As for your "not sure why" ... it doesn't get rid of missing since there are no missing values.  The table is based on the original values of the variables, not formatted values.  The OP must not have had any real missing values ... only the 77 and 99 that had a blank format label.  Since 77 and 99 are greater than 1,2,3,4 ... they ended up last.

However, if there are no actual missing values, you cannot create missing data with a format and all observations end up in the table.  That BLANK value in the table using SASHELP.CLASS is just a mask for 16.

If there actually were missing values in the OP data, they'd end up grouped with 77 and 99 and they'd all be considered as MISSING and would not be in the table

Super User
Posts: 19,789

Using Proc format to code multiple values to missing

That's fine, there's ways around it, just not the behaviour I would have expected by default.  I do understand that SAS uses the underlying value rather than the actual formatted value.

If it was a space in a character value then it would be considered a missing observation.

If it is a space in the format of a numeric variable it IS considered a missing observation if there is other missing numeric observations, but if there are no missing observations it doesn't work.

I'd expect it to be treated the same whether or not there were other missing values. So in LinLin's example the table should have 1 missing and 1 blank not 2 missing.

PROC Star
Posts: 7,471

Using Proc format to code multiple values to missing

Fareeza,

I've never confronted this before, but totally agree that it is not what one would or should expect.  The value either is or isn't missing.  I, personally, would consider this a bug that ought to be corrected.  I'd be interested in hearing how anyone might consider this a feature.

Valued Guide
Posts: 765

Re: Using Proc format to code multiple values to missing

hi ... when a missing value is grouped with non-mssing values via a format, they all end up as missing in PROC FREQ

proc format;

value x

1-3 = '<4'

other = '4+'

;

run;

data x;

input x @@;

datalines;

1 2 3 . 5 6 7 8

;

run;

proc freq data=x;

table x;

format x x.;

run;

                               Cumulative    Cumulative

x    Frequency     Percent     Frequency      Percent

<4           3      100.00             3       100.00

Frequency Missing = 5

the notes I use for my intro SAS class ... http://www.albany.edu/~msz03/epi514/notes/fp125_140.pdf ... discuss this on pages 138 and 139

I've always called it the "bad apple rule" ... one missing value spoils all the other non-missing in the group

ps  Art Carpenter has a nice write up on this ...  http://www.sascommunity.org/wiki/Missing_Values_in_Classification_Variables

with a link ... http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_sec...

that states "When you use PROC FORMAT to create a user-written format that combines missing and nonmissing values into one category, PROC FREQ treats the entire category of formatted values as missing."

PROC Star
Posts: 7,471

Re: Using Proc format to code multiple values to missing

Mike,

Again, while I wasn't familiar with the effect or the presentations (and they were nice indeed), I still agree with Fareeza that the effect SHOULD NOT be there!


Super User
Posts: 19,789

Re: Using Proc format to code multiple values to missing

That is what the documentation says, but it does that ONLY if there are missing values.

If you have a format that assumes this, but have no actual missing values then the results end up in your table, not in the missing which isn't what I'd expect according to the documentation.

Solution
‎01-06-2012 08:22 AM
Super User
Super User
Posts: 7,041

Re: Using Proc format to code multiple values to missing

You will probably get better results by recoding the actual data values.  You can use special missing values to distinguish amoung 77,99 and actual missing values.

Then those values will always be considered missing by PROC FREQ (and other analysis procedures).

Here is one way to convert using an INFORMAT and two different FORMATs. One that displays a decode for all of the values and other that maps the special values to blank.

proc format ;

value C5R  /* Normal reporting value decodes */

1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"

2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"

3 = "GRANDPARENT"

4 = "OTHER FAMILY MEMBER/FRIEND"

77,.A = "Not Applicable"

99,.B = "Refused to answer"

.  = "Not specified"

;

value C5M /* Collapse all missing values */

.,.A,.B,77,99 = ' '

other = [C5R50.]

;

invalue c5x /* Recoding Informat */

  77=.A

  99=.B

;

run;

data want ;

  do c5=1 to 4,77,99,.;

    c5x = input(put(c5,2.),c5x.);

    output;

  end;

run;

Try using PROC FREQ will all combinations of the original variable (C5) and the recoded variable (C5X), missing option, and the C5R and C5M formats.

Contributor
Posts: 71

Using Proc format to code multiple values to missing

I apologize for such a delayed response. I went out of office for a time and am now back.

As I was reading through the response to my original post, I believe that according to what was written, if the variable I am trying to recode has no original missing or blank data, I cannot recode other values such as 77 or 99 to missing using a proc format?

thank you all for your responses, but if I am understanding what has been posted, there is not simple solution of the variable I am recoding has not orignial missing values.

If I am misinterpreting, please let me know. Thank you again.

HyunJee

Super Contributor
Posts: 1,636

Re: Using Proc format to code multiple values to missing

Hi HyunJee,

What you interpreted is correct. In order to count 77, 99 as missing you have to have real missing values (.) in your dataset.

or you can pre-process you dataset:

data have ;

  set have;

  if v1 in (77,99) then v1 =.;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 501 views
  • 7 likes
  • 7 in conversation