BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HyunJee
Fluorite | Level 6

I am recoding numeric values to character values. I am wanting the values of 77 and 99 to be coded to missing so they are not included in the frequency tables.

value C5R

. = " "

1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"

2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"

3 = "GRANDPARENT"

4 = "OTHER FAMILY MEMBER/FRIEND"

77 = " "

99 = " "

I thought the above would do what I was wanting, but when I ran a proc freq that missing appeared in the table but the value was simply blank. I am not sure if this is possible to do.

any help is appreciated.

                RELATIONSHIP OF RESPONDENT TO TEEN (RECODE)

                                                                         Cumulative  Cumulative

                                               C5R                                                    Frequency   Percent   Frequency    Percent

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN        15859477     77.13    15859477      77.13

FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN               3500044     17.02    19359521      94.15

GRANDPARENT                                                                                 821907.6      4.00    20181429      98.15

OTHER FAMILY MEMBER/FRIEND                                                     376138.2      1.83    20557567      99.98

                                                                                                           4608.209      0.02    20562175     100.00   ******I want this value to appear as missing below and not be included in the frequency count or percent calculations.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You will probably get better results by recoding the actual data values.  You can use special missing values to distinguish amoung 77,99 and actual missing values.

Then those values will always be considered missing by PROC FREQ (and other analysis procedures).

Here is one way to convert using an INFORMAT and two different FORMATs. One that displays a decode for all of the values and other that maps the special values to blank.

proc format ;

value C5R  /* Normal reporting value decodes */

1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"

2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"

3 = "GRANDPARENT"

4 = "OTHER FAMILY MEMBER/FRIEND"

77,.A = "Not Applicable"

99,.B = "Refused to answer"

.  = "Not specified"

;

value C5M /* Collapse all missing values */

.,.A,.B,77,99 = ' '

other = [C5R50.]

;

invalue c5x /* Recoding Informat */

  77=.A

  99=.B

;

run;

data want ;

  do c5=1 to 4,77,99,.;

    c5x = input(put(c5,2.),c5x.);

    output;

  end;

run;

Try using PROC FREQ will all combinations of the original variable (C5) and the recoded variable (C5X), missing option, and the C5R and C5M formats.

View solution in original post

16 REPLIES 16
Linlin
Lapis Lazuli | Level 10

is this helpful?

proc format;

value C5R

. = " "

1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"

2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"

3 = "GRANDPARENT"

4 = "OTHER FAMILY MEMBER/FRIEND"

77 = " "

99 = " ";

run;

data have;

input v;

format v c5r.;

cards;

1

2

2

3

4

77

99

.

;

proc freq data=have;

  tables v;

  run;

ArtC
Rhodochrosite | Level 12

You are almost certainly applying the format in the FREQ step.  In FREQ the missing values are eliminated first, then the format is applied.  Using the format in an earlier step (Linlin uses it in the DATA step) causes this timing to change. 

MikeZdeb
Rhodochrosite | Level 12

hi ... the only way you get those missing values in your table is by adding the MISSING option in PROC FREQ (no matter where you use the FORMAT statement ... data step or PROC) ... using Linlin's data ...

proc freq data=have;

table v / missing;

format v c5r.;

run;

did you use the MISSING option .. .if so, that's how the missing values end up in you table

if you did, what I don't understand is how the missing values ended up listed LAST in your table ... if MISSING data are present in a table in PROC FREQ they are always listed first (for either numeric or character data) no matter what you do (use a format, use an ORDER= option, use a WEIGHT option which you must have to get the non-integer frequencies in your table, whatever)

so, the table in your posting is a "mystery"

also, a suggestion ... you can group the values that are assigned to missing with the format ...

proc format;

value C5R

. , 77, 99  = " "

1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"

2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"

3 = "GRANDPARENT"

4 = "OTHER FAMILY MEMBER/FRIEND"

;

run;

Reeza
Super User

Here's a simple example to get what the OP has. It doesn't remove the missing either...not sure why.

On a Windows XP, SAS 9.2.3

Regardless of where the format is applied the results are the same.

data class;

    set sashelp.class;

run;

proc format;

    value age

    11='Pre Teen'

    12='Pre Teen'

    13='Teen'

    14='Teen'

    15='Teen'

    16=" ";

run;

proc freq data=class;

    table age;

    format age age.;

run;

                                                                Cumulative    Cumulative

                                Age    Frequency     Percent     Frequency      Percent

                           ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                           Pre Teen           7       36.84             7        36.84

                           Teen              11       57.89            18        94.74

                                              1        5.26            19       100.00

Linlin
Lapis Lazuli | Level 10

Hi Reeza,

In order to get the missing counts, there must be at least one missing value in the dataset.

data class;

       set sashelp.class;

         if name='Alice' then age=.;

run;

proc format;

    value age

    11='Pre Teen'

    12='Pre Teen'

    13='Teen'

    14='Teen'

    15='Teen'

    .,16=' ';

    

run;

proc freq data=class;

    table age;

      format age age.;

run;

                                     Cumulative    Cumulative

                       Age    Frequency     Percent     Frequency      Percent

                  ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                  Pre Teen           7       41.18             7        41.18

                  Teen              10       58.82            17       100.00

                                      Frequency Missing = 2

Linlin

MikeZdeb
Rhodochrosite | Level 12

Hi ... thanks, that explains the OP table ... must not have any real missing values.

As for your "not sure why" ... it doesn't get rid of missing since there are no missing values.  The table is based on the original values of the variables, not formatted values.  The OP must not have had any real missing values ... only the 77 and 99 that had a blank format label.  Since 77 and 99 are greater than 1,2,3,4 ... they ended up last.

However, if there are no actual missing values, you cannot create missing data with a format and all observations end up in the table.  That BLANK value in the table using SASHELP.CLASS is just a mask for 16.

If there actually were missing values in the OP data, they'd end up grouped with 77 and 99 and they'd all be considered as MISSING and would not be in the table

Reeza
Super User

That's fine, there's ways around it, just not the behaviour I would have expected by default.  I do understand that SAS uses the underlying value rather than the actual formatted value.

If it was a space in a character value then it would be considered a missing observation.

If it is a space in the format of a numeric variable it IS considered a missing observation if there is other missing numeric observations, but if there are no missing observations it doesn't work.

I'd expect it to be treated the same whether or not there were other missing values. So in LinLin's example the table should have 1 missing and 1 blank not 2 missing.

art297
Opal | Level 21

Fareeza,

I've never confronted this before, but totally agree that it is not what one would or should expect.  The value either is or isn't missing.  I, personally, would consider this a bug that ought to be corrected.  I'd be interested in hearing how anyone might consider this a feature.

MikeZdeb
Rhodochrosite | Level 12

hi ... when a missing value is grouped with non-mssing values via a format, they all end up as missing in PROC FREQ

proc format;

value x

1-3 = '<4'

other = '4+'

;

run;

data x;

input x @@;

datalines;

1 2 3 . 5 6 7 8

;

run;

proc freq data=x;

table x;

format x x.;

run;

                               Cumulative    Cumulative

x    Frequency     Percent     Frequency      Percent

<4           3      100.00             3       100.00

Frequency Missing = 5

the notes I use for my intro SAS class ... http://www.albany.edu/~msz03/epi514/notes/fp125_140.pdf ... discuss this on pages 138 and 139

I've always called it the "bad apple rule" ... one missing value spoils all the other non-missing in the group

ps  Art Carpenter has a nice write up on this ...  http://www.sascommunity.org/wiki/Missing_Values_in_Classification_Variables

with a link ... http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_sec...

that states "When you use PROC FORMAT to create a user-written format that combines missing and nonmissing values into one category, PROC FREQ treats the entire category of formatted values as missing."

art297
Opal | Level 21

Mike,

Again, while I wasn't familiar with the effect or the presentations (and they were nice indeed), I still agree with Fareeza that the effect SHOULD NOT be there!


Reeza
Super User

That is what the documentation says, but it does that ONLY if there are missing values.

If you have a format that assumes this, but have no actual missing values then the results end up in your table, not in the missing which isn't what I'd expect according to the documentation.

Tom
Super User Tom
Super User

You will probably get better results by recoding the actual data values.  You can use special missing values to distinguish amoung 77,99 and actual missing values.

Then those values will always be considered missing by PROC FREQ (and other analysis procedures).

Here is one way to convert using an INFORMAT and two different FORMATs. One that displays a decode for all of the values and other that maps the special values to blank.

proc format ;

value C5R  /* Normal reporting value decodes */

1 = "MOTHER (STEP, FOSTER, ADOPTIVE) OR FEMALE GUARDIAN"

2 = "FATHER (STEP, FOSTER, ADOPTIVE) OR MALE GUARDIAN"

3 = "GRANDPARENT"

4 = "OTHER FAMILY MEMBER/FRIEND"

77,.A = "Not Applicable"

99,.B = "Refused to answer"

.  = "Not specified"

;

value C5M /* Collapse all missing values */

.,.A,.B,77,99 = ' '

other = [C5R50.]

;

invalue c5x /* Recoding Informat */

  77=.A

  99=.B

;

run;

data want ;

  do c5=1 to 4,77,99,.;

    c5x = input(put(c5,2.),c5x.);

    output;

  end;

run;

Try using PROC FREQ will all combinations of the original variable (C5) and the recoded variable (C5X), missing option, and the C5R and C5M formats.

HyunJee
Fluorite | Level 6

I apologize for such a delayed response. I went out of office for a time and am now back.

As I was reading through the response to my original post, I believe that according to what was written, if the variable I am trying to recode has no original missing or blank data, I cannot recode other values such as 77 or 99 to missing using a proc format?

thank you all for your responses, but if I am understanding what has been posted, there is not simple solution of the variable I am recoding has not orignial missing values.

If I am misinterpreting, please let me know. Thank you again.

HyunJee

Linlin
Lapis Lazuli | Level 10

Hi HyunJee,

What you interpreted is correct. In order to count 77, 99 as missing you have to have real missing values (.) in your dataset.

or you can pre-process you dataset:

data have ;

  set have;

  if v1 in (77,99) then v1 =.;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 5193 views
  • 7 likes
  • 7 in conversation