BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fieldsa83
Quartz | Level 8

Is there a good way to keep all of them organized?

 

We have about 100 variables, each with their own value formats. Right now they are all simply in 1 very large, unmanageable PROC FORMAT list. 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Depends on how much you want to automate. You can just use a PUT statement to write the values to the log and copy and paste them. You can try using PROC SQL to stick it into a macro variable.

 

proc sql noprint;
select catx(' ',name,strip(fmtname)||'.')
  into :fmtlist separated by ' '
  from list
;
quit;
...
format &fmtlist ;
...

View solution in original post

28 REPLIES 28
ballardw
Super User

1) Create a permanent library that every user can access where you want to store the formats

2) Reference that library on the Proc format statement (bonus: create a cntlout data set into the same library with the CNTLOUT option)

3) Set the format search path to include that library with an options statement or modify the settings for each user

 

Skeleton code:

 

libname MyFmt "path to library";

 

proc format library=MyFmt cntlout=MyFmt.Cntlout;

<formats>

;

 

options fmtsearch=( work Sashelp MyFmt);

 

Once the format catalog is created in them MyFmt library you only need to run the Proc Format code when you add/change format definitions.

The Libname and Options could be in Autoexec.sas or other program run a the start of session or set in properties.

 

 SAVE THE PROC FORMAT CODE. You may need to reexecute it when upgrading (or use the Cntlout data set to rebuild the format catalog)

Astounding
PROC Star

Permanently save the formats.  This means:

 

  • Understand the LIBRARY= option on PROC FORMAT, to permanently save formats.  Look at how updating works, when you are changing a permanently saved format.
  • Understand the FMTSEARCH option to be able to access permanently saved formats.

Then make one person the gatekeeper.  All enhancements to the formats must go through that person.

Kurt_Bremser
Super User

The best option to preserve formats is the code you have, period. It is just text that can easily be saved, copied, versioned, and ported to other operating systems. (see Maxims 26 and 27)

 

Now, you might consider to do some things that make the code more readable. One of those might be to create data steps with datalines that create the cntlin datasets for proc format, so you have one data step per format.

 

Another often overlooked option is that your formats might actually be recreations of already existing domain tables in your production database. If that is the case, have the domains exported and write a program that converts these tables to cntlin datasets. If there is an option to drive your programs from data, use it (see Maxim 33).

Peter_C
Rhodochrosite | Level 12
Kurt's proposal could be extended - in that production database have a view created which will deliver the cntlin data, only when required.

In support of @ballardw
To add a format library to the format search paths option, use something like
OPTION APPEND= FMTSEARCH mylib.mfmt_catalog ;
In this way we do not replace earlier modifications to the format search path.
(and see also OPTION INSERT= )
fieldsa83
Quartz | Level 8

I use proc tabulate with a lot of different variables. 

 

Once I've loaded all the format value labels in to memory, is there an easy way to reference, or possibly automatically attach format value labels in proc tabulate without having to manually call them?

 

For example if i use sex and age in one table, and then industry and occupation in another, I don't want ot be manually writing out format sex sex. age age. ; and then redoing all that when my selected variables are changing frequently. Do you know what I mean?

Tom
Super User Tom
Super User

Add the FORMAT statement to the data step that creates the data.

 

data mylib.mydata ;
   infile 'mysourefile.csv' dsd firstobs=2 truncover ;
   length AGE 8 SEX $1 ;
   input age sex ;
   format age age. sex sex. ;
run;
fieldsa83
Quartz | Level 8

Thanks but is there a way to replace the FORMAT statement with something that automatically identifies all the variables that have been read in and apply the corresponding label? It seems tedious when I have 100 variables that are sometimes read in but sometimes not... I don't want to spend 10 minutes trying to figure out which formats need to be applied each time I want to create a table. 

Tom
Super User Tom
Super User

Are you re-reading your text files over and over again? If so why?

First make SAS datasets from your data. It can be one or many datasets. It is this step that will define your AGE variable including the label and format that is attached to it.

Then when you want to make your presentation you can either just run from that table or build some temporary table by selecting values from from one or more dataset. In either case the formats will continue to be attached to the variables.

 

fieldsa83
Quartz | Level 8

The issue with that is the full dataset ends up being like 50gb (brings in hundreds of text files--each file representing 1 month of the series). So I find it easier just to read in the few text files for the months that I need with a handful of variables depending on the analysis.

Tom
Super User Tom
Super User

make a program or a view that reads the data and just pass in the list of files to read and/or variables to keep as parameters.

 

%let filelist=month1 month2 month3;
%let varlist=age sex ;

%include 'program that reads files';
fieldsa83
Quartz | Level 8

I have something like that already, but it's the value label formatting that is proving to be the most tedious part...

 

Edit: oh I see, you mean reference a central database. What I have now references the text files. Perhaps it would be advantageous to just have 1 single database but I fear it will be too large, and adding new data each month could add more difficulties.

 

 

Tom
Super User Tom
Super User

If you have 1 to 10,000 text files that all have the same structure then you only need to write the program that reads them once.

For example if your text files all live in the same folder your program could be as simple as something like this.

 

%let filelist='month1.txt' 'month2.txt';
%let keeplist=age sex ;

filename dir '/directory_name';

data want ;
  infile dir(&filelist) dsd truncover ;
  length id 8 age 8 sex $1 ;
  input id age sex ;
  label
   id='Unique id' 
   Age='Age at onset'
   Sex = 'Gender'
  ;
  format age age. sex sex. ;
  keep &keeplist;
run;
fieldsa83
Quartz | Level 8

Yes, for sure; that's basically what I have now. 

 

The issue is that there are 100 variables, not just age and sex. 

 

And so if I want to make a table that only uses like 20 of the 100 variables, I basically have them all in a list and comment out the ones I don't want to read in. However, as it is now, I'd have to do the same process (manually curating a list of variables and their corresponding value labels). It would be easier if I could just have something that reads in the data, then makes a list, then checks that list against a list of formats and then just does all that automatically. 

 

So this reads in first quarter 2007 and 2017. But imagine the list of variables like 10x as long and so I just comment out most of them.

 

 

options minoperator mindelimiter=',';
%macro doloop;
data _null_;
/* Start year    */      call symput ("startyear",'2007');              
/* Start month   */      call symput ("startmonth",'01');
/* End year      */      call symput ("endyear",'2017');
/* End month     */      call symput ("endmonth",'04');
 
/* Only specific months */       call symput /*("limitmonth",'');*/ ("limit",'and (&i in (2007,2017) and &month in(01,02,03,04))'); /*for all months in range leave everything between single quotes blank*/
 
stop;
run;
 
/* This section starts the do loop logic */
%do i=&startyear %to &endyear;
%do j=1 %to 12;
        %if &j.=1 and &i.=&startyear %then %let j=&startmonth;                           /* on first loop, sets &startmonth (instead of defaulting to 1)*/
                %let month=%sysfunc(putn(&j,z2.));                                                               /* leading 0 for month if needed */
                %let year=%sysfunc(substrn(&i.,3));                                                      /* last 2 digits of year */
                %if ( ((&endyear. ne &startyear) and (&i. ne &endyear)) OR (&i.=&endyear and &month. le &endmonth) ) /* continues if not end year OR if end year but less or equal to endmonth */
                        &limit                                                                                                                                         /*continues if conditions from symput above are met*/
                %then %do;                                                                                                                                                                                        /* continues... */
 
data TAB&month.&year.;                                                                                                   /* start data step*/
 
                                %if &i>=2017 %then                                                                              /* use pre-release folder if year >= 2017 */
                                %do;
                                        infile "\\DIRECTORY1\20&year.\tab&month.&year..prn" lrecl=430 missover;
                                %end;
                                %else                                                                                                    /* use main tabs folder if not >= 2017 */
                                %do;
                                        infile "\\DIRECTORY2\tab&month.&year..prn" lrecl=430 missover;
                                %end;
 
 
/*__ STEP 2: INPUT DESIRED VARIABLES _________________________________________________________________________________________________*/
                                input
                         @6                         DWELID                   $18.            /* Dwelling ID */
                                @29             SYEAR                   4.                      /* Survey year - yyyy */
                                @29             SDATE                   6.                      /* Survey year and month yyyymm */
                                @33             SMONTH                  2.                      /* Survey month - mm */
                                @63             FINALWT                 5.                      /* Final weight for respondent */
                                @119    AGE_TABS                3.                      /* Age of respondent as of end of
       /*       @125    EDUCLEV                 1.                      /* Highest level of edu (break at 1990) */
...
 
               
;       /*this semicolon needs to stay (ends the DATA step)*/
%include "\\DIRECTORY1\PROC_FORMAT2.sas"; /* brings in formats*/
run;    
%end;                    /* end restriction for going past end month */
%end; %end;      /* end main start/end month/year do loops */
%mend;
%doloop;
 
Tom
Super User Tom
Super User

There is no value in making the generation of the INPUT statement conditional if the goal is just which variables you want to KEEP.

Just make the KEEP statement variable.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 28 replies
  • 1613 views
  • 2 likes
  • 6 in conversation