BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I am fairly new to the advanced uses of proc format, but I have run into a situation that I can't resolve.

To set the stage; I am working with a fairly large dataset (2 million plus obs and 45 variables). Over half of the variables are long character strings. In an attempt to reduce the file size, I have created numeric codes for each character string. This is where I utilized proc format's cntlin feature.

I approached this by isolating each character variable and after removing duplicates I created a dataset that contained the required cntlin features (start, label, and fmtname) for a numeric informat and then ran the proc format. I repeated this procedure to create a corresponding numeric format. My thought being that I could convert the character variable to a numeric code with an input function and then apply the format for ease of data reading and reporting, while simultaneously reducing the memory requirements.

In the end this approach worked, kind of. As I investigated the final dataset I now have numeric variables and my file size is significantly smaller. I printed a few observations and everything looks good there as well. The problem comes when I open the dataset in a viewtable. A few of the formated variables (6 of 27) do not display the formats. They show up as the numeric codes. Viewing the properties of these variables generates errors in the log stating that either the format, informat, or both are incorrect.

All of the formats, informats, and conversions were coded in macros, so the inconsistent results are really puzzeling me.

I have tried adding additional information to the cntlin dataset (length, max length, "other" values) all to no avail. Anybody have any sugggestions??
9 REPLIES 9
Cynthia_sas
SAS Super FREQ
Hi:
I suspect a program logic problem or a data issue. Generally, when applying a format, if the value of the variable does not match EXACTLY what is specified in the format, then the variable value is displayed. That sounds like what is happening here.

If you can figure out how to code OTHER with CNTLIN/CNTLOUT, you could verify whether your stray 6 are falling through the format value list. Since the solution to this problem requires examining your EXACT code and possibly looking at a sample of your data, your best bet for help with this question is to contact Tech Support.

To send a question to Tech Support, go to http://support.sas.com/ and in the left-hand navigation pane, click on the link entitled "Submit a Problem".

cynthia
deleted_user
Not applicable
Thank you for the response Cynthia. I will follow up through tech support.

If possible to clarify for general information purposes. How is it possible to not exactly match the format when the format was created from the variable?

I was able to succesfully code an OTHER hlo variable and none of the "unformatted" variables showed up that way either.

Thanks again.
Cynthia_sas
SAS Super FREQ
Hi:
Speaking from the standpoint of the INPUT/INFORMAT/INVALUE, you could have done something incorrect with leading or trailing blanks that would make the INPUT not assigne the right number. So, then, when you went back to apply the format again, you wouldn't have a match. Something along the lines of:
[pre]

proc format;
invalue namef ' Alfred' = 1
'AliCE' =2
' Barbara '=3
'CaroL'=4
'Henry'=5
'James'=6
'Jane'=7
'Janet'=8
'Jeffrey'=9
'John'=10
'Joyce'=11
'Judy'=12
'Louise'=13
'Mary'=14
'Philip'=15
'Robert'=16
'Ronald'=17
'Thomas'=18
'William'=19
other = 0;

value numf 1= 'Alfred'
2='Alice'
3='Barbara'
4='Carol'
5='Henry'
6='James'
7='Jane'
8='Janet'
9='Jeffrey'
10='John'
11='Joyce'
12='Judy'
13='Louise'
14='Mary'
15='Philip'
16='Robert'
17='Ronald'
18='Thomas'
19='William';
run;

data newvar;
length name_number alt_nn 8;
set sashelp.class;
name_number = input(name, $namef.);
alt_nn = input(name,$namef2.);
run;

proc contents data=newvar;
title 'Proc Contents';
run;

ods listing;
proc freq data=newvar;
title 'Proc Freq';
tables name_number alt_nn;
run;

proc print data=newvar;
title 'Proc Print';
var name name_number alt_nn;
format name_number alt_nn numf.;
run;

[/pre]

It's possible that when you read the character string, that either some default length or some data condition was not accounted for. For example, in the above INFORMAT, the first 4 names will not match to SASHELP.CLASS because of leading/trailing blanks or badly capitalized entry.

So then, when the proc print runs, and uses the FORMAT that will, I think, display the right name for the number, I get 4 people who don't match.

I did not go into CNTLIN/CNTLOUT to build my INFORMAT and FORMAT -- but generally, it's little things like that -- default lengths, different treatment of spaces, etc that might have made your program not work.

cynthia

The output from PROC FREQ and PROC PRINT:
[pre]

Proc Freq

Cumulative Cumulative
name_number Frequency Percent Frequency Percent
----------------------------------------------------------------
0 4 21.05 4 21.05
5 1 5.26 5 26.32
6 1 5.26 6 31.58
7 1 5.26 7 36.84
8 1 5.26 8 42.11
9 1 5.26 9 47.37
10 1 5.26 10 52.63
11 1 5.26 11 57.89
12 1 5.26 12 63.16
13 1 5.26 13 68.42
14 1 5.26 14 73.68
15 1 5.26 15 78.95
16 1 5.26 16 84.21
17 1 5.26 17 89.47
18 1 5.26 18 94.74
19 1 5.26 19 100.00


Cumulative Cumulative
alt_nn Frequency Percent Frequency Percent
-----------------------------------------------------------
0 19 100.00 19 100.00

Proc Print

name_
Obs Name number alt_nn

1 Alfred 0 0
2 Alice 0 0
3 Barbara 0 0
4 Carol 0 0
5 Henry Henry 0
6 James James 0
7 Jane Jane 0
8 Janet Janet 0
9 Jeffrey Jeffrey 0
10 John John 0
11 Joyce Joyce 0
12 Judy Judy 0
13 Louise Louise 0
14 Mary Mary 0
15 Philip Philip 0
16 Robert Robert 0
17 Ronald Ronald 0
18 Thomas Thomas 0
19 William William 0

[/pre]
deleted_user
Not applicable
Thanks for the response.

I see your point on how mishandling the data, format, or format name could cause a mismatch and maybe there is a glitch that I have not caught yet (hopefully tech support can help if that's the case).

The thing that makes me doubt the programming error is that the format works, but only in the output. For example, I create the format and informat, use the informat to create the new variable, and associate the format with the new variable. Upon completion, I run a proc print of the data and I get correctly formatted output. However, I open the dataset in a viewtable and for 6 of my variables all I see are the informatted numeric codes and errors appear in the log upon viewing the variable properties.

So I guess my question really is, how can a format work in the output but not in the viewtable?

Thanks again.
Cynthia_sas
SAS Super FREQ
Ah, I didn't quite catch that before. So the 6 that do NOT get formats in Viewtable, DO have formatted values in PROC PRINT?????

That is very strange. Did you specify the MIN= option? This Tech Support note
http://support.sas.com/kb/4/654.html
details a similar problem. To find the note, I went to support.sas.com and typed the string viewtable format in the search box. This note was the 3rd hit in the list.

If you did not code MIN=, then you'll have to wait to hear from Tech Support, as I'm stymied by what could be going on.

cynthia
deleted_user
Not applicable
That is definitely inline with the problem I am experiencing. The view attributes causes the error;
ERROR: Format MYFMT. is incorrect for variable MYVAR.
for the 6 variables and some will include an error;
ERROR: Informat XX. is incorrect for variable MYVAR.

I have never specified a Min= option. When the problem first arose I was not specifying any length related variables. Later I specified a max=, length=, and I explicitly called the format length in the input function and format statements to try and resolve the problem, but none of those worked.

I have not contacted tech support yet, but I will follow up here after doing so.

Thanks.
Cynthia_sas
SAS Super FREQ
Hi:
Well, if MIN= gives the Viewtable headaches, it's possible that MAX= or LENGTH= do the same. What happens if you take off MAX= and LENGTH= completely???

So if it's not MAX= or LENGTH=, I wonder whether it's the explicit length you used in the format or informat?? At any rate, Tech Support will help you figure it out.

cynthia

PS:
According to the doc: Formatted values can be up to 32,767 characters. Some procedures, however, use only the first 8 or 16 characters of a formatted value.

Just like some procedures only use the first 8 or 16 characters, it's also possible that Viewtable has a limit to the number of formatted characters it will display.
deleted_user
Not applicable
Cynthia,

I wanted to let you know that I was able to solve the problem (without contacting tech support).

I realized a commonality with the 6 variables that would not format in Viewable; they all have longer variable and subsequently variable format names (>8 characters). I previously thought about this when setting up the code, but I read the updates for SAS 9.1.3 (which I am running) and thought that it was not an issue.

The updates state:
The maximum length for character format names is now 31.
The maximum length for numeric format names is now 32.
The maximum length for character informat names is now 30.
The maximum length for numeric informat names is now 31.
http://support.sas.com/documentation/onlinedoc/91pdf/sasdoc_913/whatsnew_10347.pdf

None of my names were even close to the max values listed, but apparently it does not matter for Viewtable. I changed the names to 8 characters or less and everything works like it should. Message was edited by: jtaylor
Cynthia_sas
SAS Super FREQ
Hi:
I did not realize that about Viewtable. Good to know!

I'm glad you worked it out, that was some good detective work!

cynthia

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2085 views
  • 0 likes
  • 2 in conversation