Help using Base SAS procedures

use of proc format

Reply
Contributor
Posts: 30

use of proc format

I have realised some strange problems with a command which should be quite simple, proc format.

The problem is as follows.

My data is like

A$  B 

.     3

      4    

e    .

I have as seen above both missing data for the character variable, A$, shich is given by "." and " "

I want to use proc format to change the format so the data looks like this

A$  B 

.     other

      other  

other    .

It is because I then can make a proc freq which gives me

.other

Somebody who can help me?

Best regards, Silas, Aarhus University

PROC Star
Posts: 7,363

Re: use of proc format

Not sure why you would want to do that, but one way you could do it is by using something like:

proc format;

  value $cmiss

   'b.'='.'

   ' '=' '

   other='other'

  ;

  value nmiss

   .=.

   other='other'

  ;

run;

options missing=' ';

data have;

  infile cards truncover;

  input @;

  _infile_=tranwrd(_infile_,'.','b.');

  format a $cmiss. b nmiss.;

  input a $2. b;

  cards;

. 3

  4

;

Contributor
Posts: 30

Re: use of proc format

Hey Arthur.

Thanks for the fast answer, i see it works for  with your code, but unfortunate it still not works with my original.

I have a data set with the variables YEAR and AKASSE. AKASSE does have codes " ", "." and other numbers.

When I only use the proc format of yours I only get " " and "other" in my proc freq results and  "." are included in "other".

Do you understand?

So how can I use the proc format properly on the original data set?

Again thanks! Smiley Happy

PROC Star
Posts: 7,363

Re: use of proc format

I don't understand what your data look like.  You will probably have to provide an example in the form of a datastep.

If you are saying that you have both blank and . values for both numeric and character variables, then you might be able to achieve what you want with something like:

proc format;

  value $cmiss

   '.b'='.'

   ' '=' '

   other='other'

;

  value nmiss

   .b=' '

   .=.

   other='other'

;

run;

options missing=' ';

data have;

  infile cards truncover;

  input @;

  _infile_=tranwrd(_infile_,'.','.b');

  format a $cmiss. b nmiss.;

  input a $2. b;

  cards;

. 3

  4

f .

;

Contributor
Posts: 30

Re: use of proc format

I have a dataset as shown in the my first question, just with a lot more observations and with a variable YEAR which shall not be affected by the format.

then I have tried this code inspired by you:

proc format;

  value $cmiss

   'b.'='.'

   ' '=' '

   other='other'

  ;

  value nmiss

   .=.

   other='other'

  ;

run;

proc freq data=XXX;

     tables year*A / nocol nopercent missing;

     format _numeric_ nmiss. _character_ $cmiss. year;

run;

but this just gives me that A only consist of " " and "other", the "." is included in "other".

Do you understand? I would like to know how to continue after the proc format step given that I have a data set with the name XXXX and the three variables YEAR, A, B?

Best //

Contributor
Posts: 30

Re: use of proc format

Btw the reason I need to do is because when the danish statstics publics new data there is the problem that for character-variables it happens that missing data for some years are given "." and for other years are given " ". So at the institute of Economy in Aarhus we need to find out what years missing data is given the one or the other value.

PROC Star
Posts: 7,363

Re: use of proc format

Does the following do what you need?:

proc format;

  value $cmiss

   '.b'='.'

   ' '=' '

   other='other'

;

  value nmiss

   .b=' '

   .=.

   other='other'

;

run;

options missing='.';

data xxx;

  infile cards truncover;

  input @;

  _infile_=tranwrd(_infile_,'.','.b');

  input year a $2. b;

  cards;

2009 . 3

2010   4

2011 e

2012 f .

;

proc freq data=XXX;

     tables year*A / nocol nopercent missing;

     format _numeric_ nmiss. _character_ $cmiss. year;

run;

Super User
Posts: 5,082

Re: use of proc format

Your requirements are nearly clear, but ...

If A and B fall into the "other" category, do you need them broken down by year?  If not, you might consider a new variable entirely:

if B=. then B_new = year;

else b_new=B;

Then run the PROC FREQ on B_new.

For A, it's a little trickier, since you have to make sure A has a length of at least 4.  Assuming it does:

if A in (' ', '.') then A_new = put(year,4.);

else A_new = A;

Then run the PROC FREQ on A_new.

Depends on what you are trying to find out.  And if you can afford to run multiple PROC FREQs, one for each variable, skip all the complexity:

proc freq data=have;

   tables year;

   where B=.;

run;

proc freq data=have;

   tables year;

   where A in (' ', '.');

run;

Of course, all of this is dependent on whether I'm interpreting the problem correctly or not.

Good luck.

Super User
Super User
Posts: 6,500

Re: use of proc format

It works fine for me (see below).

Do you want the blanks and periods to be distinct categories?

Did you remember to use the MISSING option in the PROC FREQ code?

Are you sure that the periods are getting into the SAS dataset?  If you are reading from a text file SAS normally converts '.' to ' '.  You can prevent this by using $CHAR informat instead of $ informat.

data test;

a=' '; output;

a='.'; output;

a='X'; output;

run;

proc format ;

value $a ' '=' ' '.'='.' other='Z';

run;

proc freq ;

tables a / missing  ;

format a $a.;

run;

The FREQ Procedure

                              Cumulative    Cumulative

a    Frequency     Percent     Frequency      Percent

------------------------------------------------------

            1       33.33             1        33.33

.           1       33.33             2        66.67

Z           1       33.33             3       100.00

Contributor
Posts: 30

Re: use of proc format

Hey Tom.

Your program does also work for me with the data set you make.

But unfortunate - and I do not understand why - it still does not work when I use the program on my original data set.

To show my problem more specific below  I have uploaded my problemativ results:

This is the result when I do proc freq on original data

akasse_freq.png

Then I use your format and I then have this result:

akasse_freq_format.png

Which is clearly wrong.

Do you have any idea on how it fails. I need three categories 1: '.' 2: ' ' 3: 'other'

And it works perfect for me when I use your data set?

Thank again! Best / Silas

Contributor
Posts: 30

Re: use of proc format

Thanks for help everybody... Problem seems to be solved now! :smileygrin:

Ask a Question
Discussion stats
  • 10 replies
  • 344 views
  • 2 likes
  • 4 in conversation