BookmarkSubscribeRSS Feed
Ruhi
Obsidian | Level 7


Hi

I wanna know what is the best way to use user defined formats in your datasets when you have a long dataset with say 1500 variables.

Specifically say I have permanent dataset in SAS and I have the formats another user defined on it. How do I use his formats while reading the dataset.

thanks

8 REPLIES 8
ballardw
Super User

Are the formats associated with the variables? You can tell by running proc contents on the data set and see if the user defined formats are associated.

If the format isn't associated then you need to do that. You can assign formats to existing data without creating a new data set using Proc Datasets and a MODIFY statement.

If the formats are associated but the formatted values don't appear then the formats are probably not in the current format search path.

In what form is the "I have the formats"? Is it code to make the format or a Format catalog?

Ruhi
Obsidian | Level 7

I can see from proc contents that user defined formats are associated. The formats that I have are in the form of sas code using proc format. I don't see any permanent format libray or catalogue used by another user who defined these formats. I am not sure how do I use these format codes to read the formatted data . I know how to do in small datasets but for more than 1500 variables, I don't know what to do. I looked up online documentation but I am still confused. Pls help!!!

thanks.

Cynthia_sas
Diamond | Level 26

Hi, there is a difference between FORMATS (used for displaying values) and INFORMATS (used for reading data into SAS format). So, you say you have the PROC FORMAT code. Were your user-defined formats created with a VALUE statement or an INVALUE statement?

  For a discussion of the INFORMAT and how to use it, see the documentation:

Base SAS(R) 9.4 Procedures Guide, Third Edition

cynthia

jakarman
Barite | Level 11

A format (informat) is just a piece of code. It gets associated with a variable somehow like an object property.

By this it can get activated on run-time for the input/output display as a called routine. Within the functions calls it can be used for conversions.

In the very early days (V5 and before) of sas it where really loadmodules able to be activated by the SASLIB DD name. The dd name Library has a dedicated role. 

For fun: http://www2.sas.com/proceedings/sugi28/116-28.pdf an old conversion note 17182 - ERROR: Library SASLIB is not in a valid format for access method SASE7

Knowing the kind of artifacts that informats/formats  are you can set up a Life Cycle Management process for those kind of objects (SDLC).

When there is some Develop Test Acceptance Production area-s/containers the formats should follow the same process as the life cycles of all other code artifacts.

When the formats are dependent on variable data-content than the code of doing that is going through that life cycle. The resulting format however should be placed in a place associated to the data.

Data elements and code artifacts should be segregated as they are having a different type of properties and requirements.

All this is rather classic software engineering and not special to a tool like SAS. Your question is more translating those SAS specific technical details to the generic good practices.

With SAS format/informats there a SAS option (FMSTSEARCH) that should be set within each SAS project according to the stage in the life-cycle (DTAP) and the project.

You can use concatenated librareis for SAS formats to eliminated the impact of analyses witch component is at which stage. 

That is technical conform path (Windows/Unix) or Steplib/Joblib usage finding load modules.        

---->-- ja karman --<-----
KachiM
Rhodochrosite | Level 12

Try

OPTIONS NOFMTERR;


If getting the format is not an option, you can use the nofmterr system option. This will tell SAS to

read the data set without the formats. SAS will replace the missing formats

with the w. or $w. default format, and SAS will issue a warning in the log

telling you that it couldn't    find the format file.

Tom
Super User Tom
Super User

My understanding of you problem is that you have a dataset and code to generate the associated user formats.

You just need to %INCLUDE the code so that the formats are defined.  Let's say the data set is named 'somedata.sas7bdat' and the text file with the format code is named 'formats.sas';


Assuming both are in the same directory then you code would look like this:


%let dir=c:\downloads ;

libname mydata "&dir" ;

%inc "&dir\formats.sas";

proc means data=mydata.somedata ....

proc print data=mydata.somedata .... ;

...


jakarman
Barite | Level 11

Tom, there is no need to run the code to define formats over and over again. When needing to underin having used the correct format version.

That is a question on software governance. Well governance may be a dirty word. Governance is about the best way to use...

You must know the FDA and I know a document describing something like that: http://www.r-project.org/doc/R-FDA.pdf   Chapter 6 is describing the way a tool is developed.

The same approach of SDLC is valid for the analytic user process. You are seeing:

- Source Code management

- Testing and Validation

- Release Cycles

- Current / archived versions (retention periods)

- Qualified Personnel

- physical and logical security

- Disaster recovery

Please explain why you are resisting to follow these kind of guidelines.

The last question is one of basic questions regulators are having on top of those documents.  They are high level goals that are described.

It is your choice how to achieve those. Follow common used practices, do something on your own. And when that is acceptable, you are ok.  

It almost amazing how a lot of this is going in a trial error approach be technical ideas and not be reviewing possible impact and possible solutions.      

---->-- ja karman --<-----
Tom
Super User Tom
Super User

The solution depends on the problem.

If the problem is that some one gave you ONE dataset with ONE set of formats that you need to use for ONE simple analysis then building a system to validate formats etc is out of scope.  Keeping the definition of the formats is a single source text file is probably the safest method to manage that situation.

If you are building a system to handle multiple datasets for multiple similar types of data (say clinical trials) then you should define standard formats and datasets and use stored format catalogs with source control and validated processes for updating and accessing the data and the formats. Personally in that environment I would discourage the use of formats that are specific to a single dataset or even a single clinical trial.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2717 views
  • 3 likes
  • 6 in conversation