DATA Step, Macro, Functions and more

formats and informats

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

formats and informats

We are working with a large data set (>7,000 variables). Since many of these variables are categorical they have assigned formats. The data was exported from Redcap and a SAS pathway mapper was used to create the formats for each variable. The problem arises when the log tells us that the format limit of 4096 formats/informats has been reached. The data will not run without these formats and it will take significant time to go back through and recreate by hand. We have already tried to split the formats into separate data steps as well to no avail. We have also searched google and other forums and no one seems to have a legitamite answer to this dilemma. If anyone has any suggestions that would be helpful. We would like to keep the formats and not have to go through and combine them.

 

Thanks


Accepted Solutions
Solution
‎07-10-2017 01:57 PM
Super User
Super User
Posts: 7,042

Re: formats and informats


mbmadows wrote:

The Redcap database export provides a code with all of the formats. As of right now the categorical are in $500, the dates are in yymmdd10. , and the numerical variables are in best12. Many of the numerical have values such as 1=Stage 1, 2=Stage 2, and so on. 


Assuming that Redcap export is like other utilities to generate SAS code they have given you a program with PROC FORMAT statement(s) to make the format library.  So something like this:

proc format;
  value one 1='one' 2='two';
  value two 1='one' 2='two';
  value three 2='A' 3='B';
  value $four '1'='one' '2'='two';
run;

So here is some code that will check to see if any of the formats share the same sets of START=LABEL combinations.  Perhaps you can use this to create either new formats or know how to just use one of the matched sets.

This code will convert your format libarary into a SAS dataset. It will then generate a "KEY" value that is in the format of START=LABEL.  It then transposes the data so each format is in one observation. It then groups the observations by the unique set of key values to find the number of unique formats.

proc format cntlout=formats noprint; run;
data format2 ;
  if eof then call symputx('max',max);
  set formats (keep=fmtname type start label) end=eof;
  by fmtname type notsorted ;
  n+1;
  if first.fmtname then n=1;
  max=max(max,n);
  retain max;
  length key $500 ;
  key = catx('=',ifc(type='N',start,quote(trim(start))),quote(trim(label)));
run;
proc sort;
  by fmtname type key ;
run;
proc transpose data=format2 out=wide prefix=key;
  by fmtname type ;
  var key ;
run;
proc sort data=wide ;
  by key1 - key&max ;
run;
data newfmts ;
  length group 8 newfmt $8 ;
  set wide ;
  by key1 - key&max;
  group + first.key&max ;
  newfmt = cats(ifc(type='C','$C','N'),put(group,z5.),'F');
run;

So for the simple example above you can see that it finds three distinct formats.

data _null_;
  set newfmts ;
  by group ;
  if first.group then put group= newfmt= '-> ' @;
  put fmtname @;
  if last.group then put;
run;
group=1 newfmt=$C00001F -> FOUR
group=2 newfmt=N00002F -> ONE TWO
group=3 newfmt=N00003F -> THREE

View solution in original post


All Replies
New Contributor
Posts: 3

Re: formats and informats

Using SAS 9.4

X64_7PRO platform

 

Super User
Posts: 19,791

Re: formats and informats

How exactly are you creating your formats and what 'format' are they in for now?

New Contributor
Posts: 3

Re: formats and informats

The Redcap database export provides a code with all of the formats. As of right now the categorical are in $500, the dates are in yymmdd10. , and the numerical variables are in best12. Many of the numerical have values such as 1=Stage 1, 2=Stage 2, and so on. 

Solution
‎07-10-2017 01:57 PM
Super User
Super User
Posts: 7,042

Re: formats and informats


mbmadows wrote:

The Redcap database export provides a code with all of the formats. As of right now the categorical are in $500, the dates are in yymmdd10. , and the numerical variables are in best12. Many of the numerical have values such as 1=Stage 1, 2=Stage 2, and so on. 


Assuming that Redcap export is like other utilities to generate SAS code they have given you a program with PROC FORMAT statement(s) to make the format library.  So something like this:

proc format;
  value one 1='one' 2='two';
  value two 1='one' 2='two';
  value three 2='A' 3='B';
  value $four '1'='one' '2'='two';
run;

So here is some code that will check to see if any of the formats share the same sets of START=LABEL combinations.  Perhaps you can use this to create either new formats or know how to just use one of the matched sets.

This code will convert your format libarary into a SAS dataset. It will then generate a "KEY" value that is in the format of START=LABEL.  It then transposes the data so each format is in one observation. It then groups the observations by the unique set of key values to find the number of unique formats.

proc format cntlout=formats noprint; run;
data format2 ;
  if eof then call symputx('max',max);
  set formats (keep=fmtname type start label) end=eof;
  by fmtname type notsorted ;
  n+1;
  if first.fmtname then n=1;
  max=max(max,n);
  retain max;
  length key $500 ;
  key = catx('=',ifc(type='N',start,quote(trim(start))),quote(trim(label)));
run;
proc sort;
  by fmtname type key ;
run;
proc transpose data=format2 out=wide prefix=key;
  by fmtname type ;
  var key ;
run;
proc sort data=wide ;
  by key1 - key&max ;
run;
data newfmts ;
  length group 8 newfmt $8 ;
  set wide ;
  by key1 - key&max;
  group + first.key&max ;
  newfmt = cats(ifc(type='C','$C','N'),put(group,z5.),'F');
run;

So for the simple example above you can see that it finds three distinct formats.

data _null_;
  set newfmts ;
  by group ;
  if first.group then put group= newfmt= '-> ' @;
  put fmtname @;
  if last.group then put;
run;
group=1 newfmt=$C00001F -> FOUR
group=2 newfmt=N00002F -> ONE TWO
group=3 newfmt=N00003F -> THREE
Super User
Posts: 5,504

Re: formats and informats

[ Edited ]

It's not clear where the process is breaking down.  Here are a couple of exploratory questions.

 

Are you able to store the formats permanently (outside of their connection to a SAS data set)?  If so, are you able to download the formats to a SAS data set with this sort of program:

 

proc format library=mylib.formats cntlout=perm.sas_dataset_holding_formats;

run;

 

Do you need the formats to actually process the data, or just to report with it later.  For example, there is a global option:

 

options nofmterr;

 

That turns off the error messages if a format cannot be found.  You could conceivably process without having the formats available.  Once you come to the point where you have a narrower data set that is ready for reporting.point the FMTSEARCH option to the format location.

 

This is pretty hairy, but ... at worst you might need to permanently save the CNTLOUT= data set from PROC FORMAT.  Then once your data set is narrow and you need fewer of the formats, use PROC CONTENTS on the narrow data set.  That tells you which of the original variables are still there, and which formats would now be needed, which lets you subset the permanently saved CNTLOUT= data set, which lets you use that subset as a CNTLIN= data set to PROC FORMAT, which lets you point the FMTSEARCH option to a smaller location, which lets you turn off NOFMTERR.

 

At best (meanning if this works), you can simply set the option:

 

options nofmterr;

 

Then once you have processed the data to the point where it is much narrower, restore both FMTSEARCH and FMTERR options.

Super User
Posts: 19,791

Re: formats and informats

Maybe the suggestions in this paper may be helpful. For something like this I'd also consider a conversation with SAS tech support. 

http://support.sas.com/resources/papers/proceedings10/010-2010.pdf

Super User
Super User
Posts: 7,042

Re: formats and informats

You have over 4,096 formats for around 7,000 variables? Perhaps you have created redundant formats.

I am not sure what Redcap is but perhaps it has an option to generate each format only once and attach it to all of the variables that need it instead of generating a separate format for each variable.

 

It might be possible to programmatically compare the format definitions to see if some are idenitical.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 170 views
  • 1 like
  • 4 in conversation