BookmarkSubscribeRSS Feed
silasskovsbo
Calcite | Level 5

I have realised some strange problems with a command which should be quite simple, proc format.

The problem is as follows.

My data is like

A$  B 

.     3

      4    

e    .

I have as seen above both missing data for the character variable, A$, shich is given by "." and " "

I want to use proc format to change the format so the data looks like this

A$  B 

.     other

      other  

other    .

It is because I then can make a proc freq which gives me

.other

Somebody who can help me?

Best regards, Silas, Aarhus University

10 REPLIES 10
art297
Opal | Level 21

Not sure why you would want to do that, but one way you could do it is by using something like:

proc format;

  value $cmiss

   'b.'='.'

   ' '=' '

   other='other'

  ;

  value nmiss

   .=.

   other='other'

  ;

run;

options missing=' ';

data have;

  infile cards truncover;

  input @;

  _infile_=tranwrd(_infile_,'.','b.');

  format a $cmiss. b nmiss.;

  input a $2. b;

  cards;

. 3

  4

;

silasskovsbo
Calcite | Level 5

Hey Arthur.

Thanks for the fast answer, i see it works for  with your code, but unfortunate it still not works with my original.

I have a data set with the variables YEAR and AKASSE. AKASSE does have codes " ", "." and other numbers.

When I only use the proc format of yours I only get " " and "other" in my proc freq results and  "." are included in "other".

Do you understand?

So how can I use the proc format properly on the original data set?

Again thanks! Smiley Happy

art297
Opal | Level 21

I don't understand what your data look like.  You will probably have to provide an example in the form of a datastep.

If you are saying that you have both blank and . values for both numeric and character variables, then you might be able to achieve what you want with something like:

proc format;

  value $cmiss

   '.b'='.'

   ' '=' '

   other='other'

;

  value nmiss

   .b=' '

   .=.

   other='other'

;

run;

options missing=' ';

data have;

  infile cards truncover;

  input @;

  _infile_=tranwrd(_infile_,'.','.b');

  format a $cmiss. b nmiss.;

  input a $2. b;

  cards;

. 3

  4

f .

;

silasskovsbo
Calcite | Level 5

I have a dataset as shown in the my first question, just with a lot more observations and with a variable YEAR which shall not be affected by the format.

then I have tried this code inspired by you:

proc format;

  value $cmiss

   'b.'='.'

   ' '=' '

   other='other'

  ;

  value nmiss

   .=.

   other='other'

  ;

run;

proc freq data=XXX;

     tables year*A / nocol nopercent missing;

     format _numeric_ nmiss. _character_ $cmiss. year;

run;

but this just gives me that A only consist of " " and "other", the "." is included in "other".

Do you understand? I would like to know how to continue after the proc format step given that I have a data set with the name XXXX and the three variables YEAR, A, B?

Best //

silasskovsbo
Calcite | Level 5

Btw the reason I need to do is because when the danish statstics publics new data there is the problem that for character-variables it happens that missing data for some years are given "." and for other years are given " ". So at the institute of Economy in Aarhus we need to find out what years missing data is given the one or the other value.

art297
Opal | Level 21

Does the following do what you need?:

proc format;

  value $cmiss

   '.b'='.'

   ' '=' '

   other='other'

;

  value nmiss

   .b=' '

   .=.

   other='other'

;

run;

options missing='.';

data xxx;

  infile cards truncover;

  input @;

  _infile_=tranwrd(_infile_,'.','.b');

  input year a $2. b;

  cards;

2009 . 3

2010   4

2011 e

2012 f .

;

proc freq data=XXX;

     tables year*A / nocol nopercent missing;

     format _numeric_ nmiss. _character_ $cmiss. year;

run;

Astounding
PROC Star

Your requirements are nearly clear, but ...

If A and B fall into the "other" category, do you need them broken down by year?  If not, you might consider a new variable entirely:

if B=. then B_new = year;

else b_new=B;

Then run the PROC FREQ on B_new.

For A, it's a little trickier, since you have to make sure A has a length of at least 4.  Assuming it does:

if A in (' ', '.') then A_new = put(year,4.);

else A_new = A;

Then run the PROC FREQ on A_new.

Depends on what you are trying to find out.  And if you can afford to run multiple PROC FREQs, one for each variable, skip all the complexity:

proc freq data=have;

   tables year;

   where B=.;

run;

proc freq data=have;

   tables year;

   where A in (' ', '.');

run;

Of course, all of this is dependent on whether I'm interpreting the problem correctly or not.

Good luck.

Tom
Super User Tom
Super User

It works fine for me (see below).

Do you want the blanks and periods to be distinct categories?

Did you remember to use the MISSING option in the PROC FREQ code?

Are you sure that the periods are getting into the SAS dataset?  If you are reading from a text file SAS normally converts '.' to ' '.  You can prevent this by using $CHAR informat instead of $ informat.

data test;

a=' '; output;

a='.'; output;

a='X'; output;

run;

proc format ;

value $a ' '=' ' '.'='.' other='Z';

run;

proc freq ;

tables a / missing  ;

format a $a.;

run;

The FREQ Procedure

                              Cumulative    Cumulative

a    Frequency     Percent     Frequency      Percent

------------------------------------------------------

            1       33.33             1        33.33

.           1       33.33             2        66.67

Z           1       33.33             3       100.00

silasskovsbo
Calcite | Level 5

Hey Tom.

Your program does also work for me with the data set you make.

But unfortunate - and I do not understand why - it still does not work when I use the program on my original data set.

To show my problem more specific below  I have uploaded my problemativ results:

This is the result when I do proc freq on original data

akasse_freq.png

Then I use your format and I then have this result:

akasse_freq_format.png

Which is clearly wrong.

Do you have any idea on how it fails. I need three categories 1: '.' 2: ' ' 3: 'other'

And it works perfect for me when I use your data set?

Thank again! Best / Silas

silasskovsbo
Calcite | Level 5

Thanks for help everybody... Problem seems to be solved now! :smileygrin:

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1088 views
  • 2 likes
  • 4 in conversation