How to efficiently manage large set of value format labels, and only apply them when in the dataset?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 76
Accepted Solution

How to efficiently manage large set of value format labels, and only apply them when in the dataset?

I have a .sas file with proc format for 200 variables (let's call it FORMATS.sas).

 

Now, when I read in actual data from an INFILE statement (let's call this DATAFILE), I %include the FORMATS.sas file.

 

Then, in the data step I have a format statement which then lists all 200 variable names and their format labels, so imagine:

 

format  
   var1 var1.
   var2 var2.
   var3 var3. ;
(...etc until all 200 are listed).
 

 

There are 2 issues with this process:

 

1. If I choose to only read in 10 of the 200 variables in the INFILE, the FORMAT statement will give 190 errors. So how can I only apply formats to variables that actually have been read in (without manually commenting out 190 lines in the format statement)?

 

2. How can I make this more efficient in general? I would like it to just have a database that it automatically grabs the format labels since the variable and its corresponding format is exactly the same name?

 

 


Accepted Solutions
Solution
‎05-05-2017 09:49 AM
Trusted Advisor
Posts: 1,022

Re: How to efficiently manage large set of value format labels, and only apply them when in the data

Posted in reply to fieldsa83

Make the dataset without formats, then run a macro calling dictionary.columns to get all the varnames and make a macrovar listing the varnames and matching format names. Then run PROC DATASETS to assign the formats, using that macrovar.

 

data test;
  input a b c;
datalines;
1 11 111
run;

proc format;
  value a other=[3.];
  value b other=[3.];
  value c other=[3.];
  value x other=[3.];
  value y other=[3.];
run;

%macro assign_fmts(dsn);
   %let dsn=%upcase(&dsn);
   %let dataset=%scan(&dsn,-1,%str(.));
   %if &dataset=&dsn %then %let libname=WORK;
   %else %let libname=%upcase(%scan(&dsn,1,%str(.)));
   %put _local_;
   proc sql noprint;
     select distinct catx(' ',name,cats(name,.)) into :fmts separated by " "
       from dictionary.columns where libname="&libname" and memname="&dataset";
   quit;
   proc datasets lib=&libname noprint ;
      modify &dataset;
      format &fmts;
      run;
   quit;
%mend ;

%assign_fmts(test);

 

 

View solution in original post


All Replies
PROC Star
Posts: 7,491

Re: How to efficiently manage large set of value format labels, and only apply them when in the data

Posted in reply to fieldsa83

I would just assign the formats to the entire file and save the file with the formats attached. Isn't that possible?

 

Conversely, if that isn't possible, then I'd create a macro in which you could specify the desired variable, use proc sql to read all of the variables from dictionary.columns, have it write a macro variable that included all of the desired format assignments, and then apply them while selecting the desired file.

 

Art, CEO, AnalystFinder.com

 

Frequent Contributor
Posts: 76

Re: How to efficiently manage large set of value format labels, and only apply them when in the data

[ Edited ]

The dictionary.column macro sounds intriguing. Do you have a sample code of how that would work?

 

(the first option isn't possible since the source data is actually in non-writeable text files, and there are hundreds of gigabytes worth--so I don't want to make 1 single database. also new text files are added each month to the folder)

PROC Star
Posts: 7,491

Re: How to efficiently manage large set of value format labels, and only apply them when in the data

Posted in reply to fieldsa83

Since they're text files you would need to be able to read in all variables. As such, you would need to know the file layout.

 

If you kept the variable names, location, informats and relevant formats if any for all variables, then that file could be accessed and, using statements in sql like:

 

select catx(' ','input ',varname,column_range,informat,'@;',

          catx(' ',format,varname,format),

          etc. into : inputs separated by ' ', formats separated by ' '

            where varname in &macro_variable_listing_all_desired_variables.

 

those macro variables could be used to both input and format the data.

 

Art, CEO, AnalystFinder.com

 

Frequent Contributor
Posts: 76

Re: How to efficiently manage large set of value format labels, and only apply them when in the data

Thanks. I'm very interested in how I would use this to input variables, as you say.

 

If for example my text file has AGE at line 10 for 3 characters, how would I incorporate that in this?

Solution
‎05-05-2017 09:49 AM
Trusted Advisor
Posts: 1,022

Re: How to efficiently manage large set of value format labels, and only apply them when in the data

Posted in reply to fieldsa83

Make the dataset without formats, then run a macro calling dictionary.columns to get all the varnames and make a macrovar listing the varnames and matching format names. Then run PROC DATASETS to assign the formats, using that macrovar.

 

data test;
  input a b c;
datalines;
1 11 111
run;

proc format;
  value a other=[3.];
  value b other=[3.];
  value c other=[3.];
  value x other=[3.];
  value y other=[3.];
run;

%macro assign_fmts(dsn);
   %let dsn=%upcase(&dsn);
   %let dataset=%scan(&dsn,-1,%str(.));
   %if &dataset=&dsn %then %let libname=WORK;
   %else %let libname=%upcase(%scan(&dsn,1,%str(.)));
   %put _local_;
   proc sql noprint;
     select distinct catx(' ',name,cats(name,.)) into :fmts separated by " "
       from dictionary.columns where libname="&libname" and memname="&dataset";
   quit;
   proc datasets lib=&libname noprint ;
      modify &dataset;
      format &fmts;
      run;
   quit;
%mend ;

%assign_fmts(test);

 

 

Frequent Contributor
Posts: 76

Re: How to efficiently manage large set of value format labels, and only apply them when in the data

This is perfect, thanks so much! Really appreciate it.

 

One thing: it seems to break if the varname has an underscore in it, do you think there's a way to solve that (not a big deal if not)?

Super User
Posts: 11,343

Re: How to efficiently manage large set of value format labels, and only apply them when in the data

Posted in reply to fieldsa83

fieldsa83 wrote:

This is perfect, thanks so much! Really appreciate it.

 

One thing: it seems to break if the varname has an underscore in it, do you think there's a way to solve that (not a big deal if not)?


You should provide an example of the variable name and the format you are want to assign and describe any symptoms in more detail than "seems to break". if you get errors post the log with the errors. Since a macro may be involved run the code with

OPTIONS MPRINT SYMBOLGEN;

to get more detail into the log. 

Frequent Contributor
Posts: 76

Re: How to efficiently manage large set of value format labels, and only apply them when in the data

Sorry I don't think the underscore is actually a problem as much as I am confused...

 

How can i add a suffix to the variable format name? I'm trying to add "en" to the end of every variable name format.

 

I tried to add it in the same section as the period but it's saying "the following columns were not found in the contribitng tables: en"

 

And if i try to add _en it says "syntax error, expecting one of the following: a name, *."

 

Ideally the result would be varname1 varname1_en.

 

 

Trusted Advisor
Posts: 1,022

Re: How to efficiently manage large set of value format labels, and only apply them when in the data

Posted in reply to fieldsa83

Then instead of

                   cats(name,.)

use

                   cats(name,'en',.))

 

 

Study the CATS function, and the related CATX, etc.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 179 views
  • 3 likes
  • 4 in conversation