BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mbmadows
Calcite | Level 5

We are working with a large data set (>7,000 variables). Since many of these variables are categorical they have assigned formats. The data was exported from Redcap and a SAS pathway mapper was used to create the formats for each variable. The problem arises when the log tells us that the format limit of 4096 formats/informats has been reached. The data will not run without these formats and it will take significant time to go back through and recreate by hand. We have already tried to split the formats into separate data steps as well to no avail. We have also searched google and other forums and no one seems to have a legitamite answer to this dilemma. If anyone has any suggestions that would be helpful. We would like to keep the formats and not have to go through and combine them.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@mbmadows wrote:

The Redcap database export provides a code with all of the formats. As of right now the categorical are in $500, the dates are in yymmdd10. , and the numerical variables are in best12. Many of the numerical have values such as 1=Stage 1, 2=Stage 2, and so on. 


Assuming that Redcap export is like other utilities to generate SAS code they have given you a program with PROC FORMAT statement(s) to make the format library.  So something like this:

proc format;
  value one 1='one' 2='two';
  value two 1='one' 2='two';
  value three 2='A' 3='B';
  value $four '1'='one' '2'='two';
run;

So here is some code that will check to see if any of the formats share the same sets of START=LABEL combinations.  Perhaps you can use this to create either new formats or know how to just use one of the matched sets.

This code will convert your format libarary into a SAS dataset. It will then generate a "KEY" value that is in the format of START=LABEL.  It then transposes the data so each format is in one observation. It then groups the observations by the unique set of key values to find the number of unique formats.

proc format cntlout=formats noprint; run;
data format2 ;
  if eof then call symputx('max',max);
  set formats (keep=fmtname type start label) end=eof;
  by fmtname type notsorted ;
  n+1;
  if first.fmtname then n=1;
  max=max(max,n);
  retain max;
  length key $500 ;
  key = catx('=',ifc(type='N',start,quote(trim(start))),quote(trim(label)));
run;
proc sort;
  by fmtname type key ;
run;
proc transpose data=format2 out=wide prefix=key;
  by fmtname type ;
  var key ;
run;
proc sort data=wide ;
  by key1 - key&max ;
run;
data newfmts ;
  length group 8 newfmt $8 ;
  set wide ;
  by key1 - key&max;
  group + first.key&max ;
  newfmt = cats(ifc(type='C','$C','N'),put(group,z5.),'F');
run;

So for the simple example above you can see that it finds three distinct formats.

data _null_;
  set newfmts ;
  by group ;
  if first.group then put group= newfmt= '-> ' @;
  put fmtname @;
  if last.group then put;
run;
group=1 newfmt=$C00001F -> FOUR
group=2 newfmt=N00002F -> ONE TWO
group=3 newfmt=N00003F -> THREE

View solution in original post

7 REPLIES 7
mbmadows
Calcite | Level 5

Using SAS 9.4

X64_7PRO platform

 

Reeza
Super User

How exactly are you creating your formats and what 'format' are they in for now?

mbmadows
Calcite | Level 5

The Redcap database export provides a code with all of the formats. As of right now the categorical are in $500, the dates are in yymmdd10. , and the numerical variables are in best12. Many of the numerical have values such as 1=Stage 1, 2=Stage 2, and so on. 

Tom
Super User Tom
Super User

@mbmadows wrote:

The Redcap database export provides a code with all of the formats. As of right now the categorical are in $500, the dates are in yymmdd10. , and the numerical variables are in best12. Many of the numerical have values such as 1=Stage 1, 2=Stage 2, and so on. 


Assuming that Redcap export is like other utilities to generate SAS code they have given you a program with PROC FORMAT statement(s) to make the format library.  So something like this:

proc format;
  value one 1='one' 2='two';
  value two 1='one' 2='two';
  value three 2='A' 3='B';
  value $four '1'='one' '2'='two';
run;

So here is some code that will check to see if any of the formats share the same sets of START=LABEL combinations.  Perhaps you can use this to create either new formats or know how to just use one of the matched sets.

This code will convert your format libarary into a SAS dataset. It will then generate a "KEY" value that is in the format of START=LABEL.  It then transposes the data so each format is in one observation. It then groups the observations by the unique set of key values to find the number of unique formats.

proc format cntlout=formats noprint; run;
data format2 ;
  if eof then call symputx('max',max);
  set formats (keep=fmtname type start label) end=eof;
  by fmtname type notsorted ;
  n+1;
  if first.fmtname then n=1;
  max=max(max,n);
  retain max;
  length key $500 ;
  key = catx('=',ifc(type='N',start,quote(trim(start))),quote(trim(label)));
run;
proc sort;
  by fmtname type key ;
run;
proc transpose data=format2 out=wide prefix=key;
  by fmtname type ;
  var key ;
run;
proc sort data=wide ;
  by key1 - key&max ;
run;
data newfmts ;
  length group 8 newfmt $8 ;
  set wide ;
  by key1 - key&max;
  group + first.key&max ;
  newfmt = cats(ifc(type='C','$C','N'),put(group,z5.),'F');
run;

So for the simple example above you can see that it finds three distinct formats.

data _null_;
  set newfmts ;
  by group ;
  if first.group then put group= newfmt= '-> ' @;
  put fmtname @;
  if last.group then put;
run;
group=1 newfmt=$C00001F -> FOUR
group=2 newfmt=N00002F -> ONE TWO
group=3 newfmt=N00003F -> THREE
Astounding
PROC Star

It's not clear where the process is breaking down.  Here are a couple of exploratory questions.

 

Are you able to store the formats permanently (outside of their connection to a SAS data set)?  If so, are you able to download the formats to a SAS data set with this sort of program:

 

proc format library=mylib.formats cntlout=perm.sas_dataset_holding_formats;

run;

 

Do you need the formats to actually process the data, or just to report with it later.  For example, there is a global option:

 

options nofmterr;

 

That turns off the error messages if a format cannot be found.  You could conceivably process without having the formats available.  Once you come to the point where you have a narrower data set that is ready for reporting.point the FMTSEARCH option to the format location.

 

This is pretty hairy, but ... at worst you might need to permanently save the CNTLOUT= data set from PROC FORMAT.  Then once your data set is narrow and you need fewer of the formats, use PROC CONTENTS on the narrow data set.  That tells you which of the original variables are still there, and which formats would now be needed, which lets you subset the permanently saved CNTLOUT= data set, which lets you use that subset as a CNTLIN= data set to PROC FORMAT, which lets you point the FMTSEARCH option to a smaller location, which lets you turn off NOFMTERR.

 

At best (meanning if this works), you can simply set the option:

 

options nofmterr;

 

Then once you have processed the data to the point where it is much narrower, restore both FMTSEARCH and FMTERR options.

Reeza
Super User

Maybe the suggestions in this paper may be helpful. For something like this I'd also consider a conversation with SAS tech support. 

http://support.sas.com/resources/papers/proceedings10/010-2010.pdf

Tom
Super User Tom
Super User

You have over 4,096 formats for around 7,000 variables? Perhaps you have created redundant formats.

I am not sure what Redcap is but perhaps it has an option to generate each format only once and attach it to all of the variables that need it instead of generating a separate format for each variable.

 

It might be possible to programmatically compare the format definitions to see if some are idenitical.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2459 views
  • 1 like
  • 4 in conversation