05-10-2017 04:45 PM
Is there a good way to keep all of them organized?
We have about 100 variables, each with their own value formats. Right now they are all simply in 1 very large, unmanageable PROC FORMAT list.
05-12-2017 02:25 AM
Depends on how much you want to automate. You can just use a PUT statement to write the values to the log and copy and paste them. You can try using PROC SQL to stick it into a macro variable.
proc sql noprint; select catx(' ',name,strip(fmtname)||'.') into :fmtlist separated by ' ' from list ; quit; ... format &fmtlist ; ...
05-10-2017 05:03 PM - edited 05-10-2017 05:05 PM
1) Create a permanent library that every user can access where you want to store the formats
2) Reference that library on the Proc format statement (bonus: create a cntlout data set into the same library with the CNTLOUT option)
3) Set the format search path to include that library with an options statement or modify the settings for each user
libname MyFmt "path to library";
proc format library=MyFmt cntlout=MyFmt.Cntlout;
options fmtsearch=( work Sashelp MyFmt);
Once the format catalog is created in them MyFmt library you only need to run the Proc Format code when you add/change format definitions.
The Libname and Options could be in Autoexec.sas or other program run a the start of session or set in properties.
SAVE THE PROC FORMAT CODE. You may need to reexecute it when upgrading (or use the Cntlout data set to rebuild the format catalog)
05-10-2017 05:04 PM
Permanently save the formats. This means:
Then make one person the gatekeeper. All enhancements to the formats must go through that person.
05-11-2017 03:32 AM - edited 05-11-2017 03:33 AM
The best option to preserve formats is the code you have, period. It is just text that can easily be saved, copied, versioned, and ported to other operating systems. (see Maxims 26 and 27)
Now, you might consider to do some things that make the code more readable. One of those might be to create data steps with datalines that create the cntlin datasets for proc format, so you have one data step per format.
Another often overlooked option is that your formats might actually be recreations of already existing domain tables in your production database. If that is the case, have the domains exported and write a program that converts these tables to cntlin datasets. If there is an option to drive your programs from data, use it (see Maxim 33).
05-11-2017 08:30 AM
05-11-2017 10:52 PM - edited 05-11-2017 10:53 PM
I use proc tabulate with a lot of different variables.
Once I've loaded all the format value labels in to memory, is there an easy way to reference, or possibly automatically attach format value labels in proc tabulate without having to manually call them?
For example if i use sex and age in one table, and then industry and occupation in another, I don't want ot be manually writing out format sex sex. age age. ; and then redoing all that when my selected variables are changing frequently. Do you know what I mean?
05-12-2017 12:43 AM
Add the FORMAT statement to the data step that creates the data.
data mylib.mydata ; infile 'mysourefile.csv' dsd firstobs=2 truncover ; length AGE 8 SEX $1 ; input age sex ; format age age. sex sex. ; run;
05-12-2017 12:46 AM
Thanks but is there a way to replace the FORMAT statement with something that automatically identifies all the variables that have been read in and apply the corresponding label? It seems tedious when I have 100 variables that are sometimes read in but sometimes not... I don't want to spend 10 minutes trying to figure out which formats need to be applied each time I want to create a table.
05-12-2017 01:04 AM
Are you re-reading your text files over and over again? If so why?
First make SAS datasets from your data. It can be one or many datasets. It is this step that will define your AGE variable including the label and format that is attached to it.
Then when you want to make your presentation you can either just run from that table or build some temporary table by selecting values from from one or more dataset. In either case the formats will continue to be attached to the variables.
05-12-2017 01:20 AM - edited 05-12-2017 01:21 AM
The issue with that is the full dataset ends up being like 50gb (brings in hundreds of text files--each file representing 1 month of the series). So I find it easier just to read in the few text files for the months that I need with a handful of variables depending on the analysis.
05-12-2017 01:23 AM
make a program or a view that reads the data and just pass in the list of files to read and/or variables to keep as parameters.
%let filelist=month1 month2 month3; %let varlist=age sex ; %include 'program that reads files';
05-12-2017 01:29 AM - edited 05-12-2017 01:32 AM
I have something like that already, but it's the value label formatting that is proving to be the most tedious part...
Edit: oh I see, you mean reference a central database. What I have now references the text files. Perhaps it would be advantageous to just have 1 single database but I fear it will be too large, and adding new data each month could add more difficulties.
05-12-2017 01:38 AM
If you have 1 to 10,000 text files that all have the same structure then you only need to write the program that reads them once.
For example if your text files all live in the same folder your program could be as simple as something like this.
%let filelist='month1.txt' 'month2.txt'; %let keeplist=age sex ; filename dir '/directory_name'; data want ; infile dir(&filelist) dsd truncover ; length id 8 age 8 sex $1 ; input id age sex ; label id='Unique id' Age='Age at onset' Sex = 'Gender' ; format age age. sex sex. ; keep &keeplist; run;
05-12-2017 01:48 AM - edited 05-12-2017 01:51 AM
Yes, for sure; that's basically what I have now.
The issue is that there are 100 variables, not just age and sex.
And so if I want to make a table that only uses like 20 of the 100 variables, I basically have them all in a list and comment out the ones I don't want to read in. However, as it is now, I'd have to do the same process (manually curating a list of variables and their corresponding value labels). It would be easier if I could just have something that reads in the data, then makes a list, then checks that list against a list of formats and then just does all that automatically.
So this reads in first quarter 2007 and 2017. But imagine the list of variables like 10x as long and so I just comment out most of them.
05-12-2017 01:51 AM
There is no value in making the generation of the INPUT statement conditional if the goal is just which variables you want to KEEP.
Just make the KEEP statement variable.
Need further help from the community? Please ask a new question.