10-31-2014 02:18 PM
I wanna know what is the best way to use user defined formats in your datasets when you have a long dataset with say 1500 variables.
Specifically say I have permanent dataset in SAS and I have the formats another user defined on it. How do I use his formats while reading the dataset.
10-31-2014 03:41 PM
Are the formats associated with the variables? You can tell by running proc contents on the data set and see if the user defined formats are associated.
If the format isn't associated then you need to do that. You can assign formats to existing data without creating a new data set using Proc Datasets and a MODIFY statement.
If the formats are associated but the formatted values don't appear then the formats are probably not in the current format search path.
In what form is the "I have the formats"? Is it code to make the format or a Format catalog?
10-31-2014 11:13 PM
I can see from proc contents that user defined formats are associated. The formats that I have are in the form of sas code using proc format. I don't see any permanent format libray or catalogue used by another user who defined these formats. I am not sure how do I use these format codes to read the formatted data . I know how to do in small datasets but for more than 1500 variables, I don't know what to do. I looked up online documentation but I am still confused. Pls help!!!
11-01-2014 01:26 AM
Hi, there is a difference between FORMATS (used for displaying values) and INFORMATS (used for reading data into SAS format). So, you say you have the PROC FORMAT code. Were your user-defined formats created with a VALUE statement or an INVALUE statement?
For a discussion of the INFORMAT and how to use it, see the documentation:
11-01-2014 05:12 AM
A format (informat) is just a piece of code. It gets associated with a variable somehow like an object property.
By this it can get activated on run-time for the input/output display as a called routine. Within the functions calls it can be used for conversions.
In the very early days (V5 and before) of sas it where really loadmodules able to be activated by the SASLIB DD name. The dd name Library has a dedicated role.
For fun: http://www2.sas.com/proceedings/sugi28/116-28.pdf an old conversion note 17182 - ERROR: Library SASLIB is not in a valid format for access method SASE7
Knowing the kind of artifacts that informats/formats are you can set up a Life Cycle Management process for those kind of objects (SDLC).
When there is some Develop Test Acceptance Production area-s/containers the formats should follow the same process as the life cycles of all other code artifacts.
When the formats are dependent on variable data-content than the code of doing that is going through that life cycle. The resulting format however should be placed in a place associated to the data.
Data elements and code artifacts should be segregated as they are having a different type of properties and requirements.
All this is rather classic software engineering and not special to a tool like SAS. Your question is more translating those SAS specific technical details to the generic good practices.
With SAS format/informats there a SAS option (FMSTSEARCH) that should be set within each SAS project according to the stage in the life-cycle (DTAP) and the project.
You can use concatenated librareis for SAS formats to eliminated the impact of analyses witch component is at which stage.
That is technical conform path (Windows/Unix) or Steplib/Joblib usage finding load modules.
11-01-2014 06:12 AM
If getting the format is not an option, you can use the nofmterr system option. This will tell SAS to
read the data set without the formats. SAS will replace the missing formats
with the w. or $w. default format, and SAS will issue a warning in the log
telling you that it couldn't find the format file.
11-01-2014 11:32 AM
My understanding of you problem is that you have a dataset and code to generate the associated user formats.
You just need to %INCLUDE the code so that the formats are defined. Let's say the data set is named 'somedata.sas7bdat' and the text file with the format code is named 'formats.sas';
Assuming both are in the same directory then you code would look like this:
%let dir=c:\downloads ;
libname mydata "&dir" ;
proc means data=mydata.somedata ....
proc print data=mydata.somedata .... ;
11-01-2014 11:55 AM
Tom, there is no need to run the code to define formats over and over again. When needing to underin having used the correct format version.
That is a question on software governance. Well governance may be a dirty word. Governance is about the best way to use...
You must know the FDA and I know a document describing something like that: http://www.r-project.org/doc/R-FDA.pdf Chapter 6 is describing the way a tool is developed.
The same approach of SDLC is valid for the analytic user process. You are seeing:
- Source Code management
- Testing and Validation
- Release Cycles
- Current / archived versions (retention periods)
- Qualified Personnel
- physical and logical security
- Disaster recovery
Please explain why you are resisting to follow these kind of guidelines.
The last question is one of basic questions regulators are having on top of those documents. They are high level goals that are described.
It is your choice how to achieve those. Follow common used practices, do something on your own. And when that is acceptable, you are ok.
It almost amazing how a lot of this is going in a trial error approach be technical ideas and not be reviewing possible impact and possible solutions.
11-01-2014 12:23 PM
The solution depends on the problem.
If the problem is that some one gave you ONE dataset with ONE set of formats that you need to use for ONE simple analysis then building a system to validate formats etc is out of scope. Keeping the definition of the formats is a single source text file is probably the safest method to manage that situation.
If you are building a system to handle multiple datasets for multiple similar types of data (say clinical trials) then you should define standard formats and datasets and use stored format catalogs with source control and validated processes for updating and accessing the data and the formats. Personally in that environment I would discourage the use of formats that are specific to a single dataset or even a single clinical trial.