I am fairly new to the advanced uses of proc format, but I have run into a situation that I can't resolve.
To set the stage; I am working with a fairly large dataset (2 million plus obs and 45 variables). Over half of the variables are long character strings. In an attempt to reduce the file size, I have created numeric codes for each character string. This is where I utilized proc format's cntlin feature.
I approached this by isolating each character variable and after removing duplicates I created a dataset that contained the required cntlin features (start, label, and fmtname) for a numeric informat and then ran the proc format. I repeated this procedure to create a corresponding numeric format. My thought being that I could convert the character variable to a numeric code with an input function and then apply the format for ease of data reading and reporting, while simultaneously reducing the memory requirements.
In the end this approach worked, kind of. As I investigated the final dataset I now have numeric variables and my file size is significantly smaller. I printed a few observations and everything looks good there as well. The problem comes when I open the dataset in a viewtable. A few of the formated variables (6 of 27) do not display the formats. They show up as the numeric codes. Viewing the properties of these variables generates errors in the log stating that either the format, informat, or both are incorrect.
All of the formats, informats, and conversions were coded in macros, so the inconsistent results are really puzzeling me.
I have tried adding additional information to the cntlin dataset (length, max length, "other" values) all to no avail. Anybody have any sugggestions??
I suspect a program logic problem or a data issue. Generally, when applying a format, if the value of the variable does not match EXACTLY what is specified in the format, then the variable value is displayed. That sounds like what is happening here.
If you can figure out how to code OTHER with CNTLIN/CNTLOUT, you could verify whether your stray 6 are falling through the format value list. Since the solution to this problem requires examining your EXACT code and possibly looking at a sample of your data, your best bet for help with this question is to contact Tech Support.
To send a question to Tech Support, go to http://support.sas.com/ and in the left-hand navigation pane, click on the link entitled "Submit a Problem".
Speaking from the standpoint of the INPUT/INFORMAT/INVALUE, you could have done something incorrect with leading or trailing blanks that would make the INPUT not assigne the right number. So, then, when you went back to apply the format again, you wouldn't have a match. Something along the lines of:
proc print data=newvar;
title 'Proc Print';
var name name_number alt_nn;
format name_number alt_nn numf.;
It's possible that when you read the character string, that either some default length or some data condition was not accounted for. For example, in the above INFORMAT, the first 4 names will not match to SASHELP.CLASS because of leading/trailing blanks or badly capitalized entry.
So then, when the proc print runs, and uses the FORMAT that will, I think, display the right name for the number, I get 4 people who don't match.
I did not go into CNTLIN/CNTLOUT to build my INFORMAT and FORMAT -- but generally, it's little things like that -- default lengths, different treatment of spaces, etc that might have made your program not work.
alt_nn Frequency Percent Frequency Percent
0 19 100.00 19 100.00
Obs Name number alt_nn
1 Alfred 0 0
2 Alice 0 0
3 Barbara 0 0
4 Carol 0 0
5 Henry Henry 0
6 James James 0
7 Jane Jane 0
8 Janet Janet 0
9 Jeffrey Jeffrey 0
10 John John 0
11 Joyce Joyce 0
12 Judy Judy 0
13 Louise Louise 0
14 Mary Mary 0
15 Philip Philip 0
16 Robert Robert 0
17 Ronald Ronald 0
18 Thomas Thomas 0
19 William William 0
I see your point on how mishandling the data, format, or format name could cause a mismatch and maybe there is a glitch that I have not caught yet (hopefully tech support can help if that's the case).
The thing that makes me doubt the programming error is that the format works, but only in the output. For example, I create the format and informat, use the informat to create the new variable, and associate the format with the new variable. Upon completion, I run a proc print of the data and I get correctly formatted output. However, I open the dataset in a viewtable and for 6 of my variables all I see are the informatted numeric codes and errors appear in the log upon viewing the variable properties.
So I guess my question really is, how can a format work in the output but not in the viewtable?
Ah, I didn't quite catch that before. So the 6 that do NOT get formats in Viewtable, DO have formatted values in PROC PRINT?????
That is very strange. Did you specify the MIN= option? This Tech Support note http://support.sas.com/kb/4/654.html
details a similar problem. To find the note, I went to support.sas.com and typed the string viewtable format in the search box. This note was the 3rd hit in the list.
If you did not code MIN=, then you'll have to wait to hear from Tech Support, as I'm stymied by what could be going on.
That is definitely inline with the problem I am experiencing. The view attributes causes the error;
ERROR: Format MYFMT. is incorrect for variable MYVAR.
for the 6 variables and some will include an error;
ERROR: Informat XX. is incorrect for variable MYVAR.
I have never specified a Min= option. When the problem first arose I was not specifying any length related variables. Later I specified a max=, length=, and I explicitly called the format length in the input function and format statements to try and resolve the problem, but none of those worked.
I have not contacted tech support yet, but I will follow up here after doing so.
I wanted to let you know that I was able to solve the problem (without contacting tech support).
I realized a commonality with the 6 variables that would not format in Viewable; they all have longer variable and subsequently variable format names (>8 characters). I previously thought about this when setting up the code, but I read the updates for SAS 9.1.3 (which I am running) and thought that it was not an issue.
None of my names were even close to the max values listed, but apparently it does not matter for Viewtable. I changed the names to 8 characters or less and everything works like it should.
Message was edited by: jtaylor