Hi, I just wanted to see if someone could explain what this means exactly:
data disp;
set trcb.DispositionCode_Xref;
length start $2 label $100;
fmtname = '$disposition';
start = Disp_Code;
label = Disp_Code_Desc;
keep start label fmtname;
run;
proc sort data=disp nodupkey;
by start;
run;
I am not well-versed on formats that much. I get the gist but when I look at this I'm a bit lost on what exactly this means. Thank you, any help is appreciated.
It appears that the data set is intended to be used with the CNTLIN option of Proc Format to create a format named DISPOSITION for use with character values.
The structure of a CNTLIN data set requires certain variables, minimum is the name of the format which must be in a variable named FMTNAME, Start as the value and Label as the text to display instead of the value of Start.
There are other variables to indicate other properties set as options.
If you know how to write small custom format you could use the CNTLOUT= option on proc format to create a data set that describes the format(s). That may help understand a bit better what the variables must look like.
The Sort is not needed and could actually be improper if the set had more than one value of FMTNAME as the Format procedure would expect all of the lines for a single format to be together in the data set. The NODUPEKEY is there to prevent errors of multiple observations with the same Start value. Again proc format except for very specific purposes expects to see values in the Start (or End if a range of numeric values is used) only once.
I would expect to see something like this to actually create the format for use:
Proc format <library=somelib> cntlin=disp; run;
This would be similar to writing a Proc format value statement like this:
Proc format; Value $disposition '01' = 'Meaning for code 01' '02' = 'Meaning for code 02' ; run;
The '01' '02' are the START values. The 'Meaning for code' text are the values of LABEL and $disposition is the FMTNAME value.
You have code to create a PROC FORMAT CNLTLIN= data set.
To create the character value label format $DISPOSITION using this data you need to this.
proc format CNTLIN=disp;
quit;
The you would be able to use $disposition. in a format statement or PUT function.
What part do you not understand?
The SAS code you shared?
The DATA step is making a new dataset from an old dataset. The dataset it is making can be used to create a FORMAT.
The PROC SORT step is to order the observations and make sure the same START value does not appear more than once.
How to convert that dataset into a format?
Use PROC FORMAT with the CNTLIN= option.
proc format cntlin=disp;
run;
What a FORMAT does?
Formats are used to convert values into text. In this case the values are character strings up to two bytes long. Your dataset can be used to make a format named $DISPOSITION that converts values that match the values of DISP_CODE into text strings of the corresponding value of DISP_CODE_DESC.
How to use a FORMAT?
You would use the $DISPOSTION format with any variable that has the same set of 2 letter codes that was in your dataset so that they are printed/displayed/converted to text. Say you have a dataset named EVENTS that had a variable named EVENT_DISP that was using those same 2 letter codes. You could then get a summary of how many events ended with each disposition code using a step like this:
proc freq data=events;
tables event_disp;
format event_disp $disposition. ;
run;
The code snippet you've provided is creating a custom format in SAS, and I'll walk you through each step.
Step-by-Step Explanation
Creating the Dataset disp
:
data disp;
set trcb.DispositionCode_Xref;
length start $2 label $100;
fmtname = '$disposition';
start = Disp_Code;
label = Disp_Code_Desc;
keep start label fmtname;
run;
data disp;
: This starts a new dataset named disp
.set trcb.DispositionCode_Xref;
: This reads in data from an existing dataset called trcb.DispositionCode_Xref
.length start $2 label $100;
: This sets the length of two new variables:start
(length 2 characters)label
(length 100 characters)fmtname = '$disposition';
: This assigns the value '$disposition'
to a new variable called fmtname
. This variable will be used to define the name of the custom format.start = Disp_Code;
: This assigns the value from the variable Disp_Code
to the start
variable. In the context of creating formats, start
represents the values that will be formatted.label = Disp_Code_Desc;
: This assigns the value from the variable Disp_Code_Desc
to the label
variable. The label
represents the formatted output that corresponds to the start
value.keep start label fmtname;
: This keeps only the start
, label
, and fmtname
variables in the final disp
dataset, discarding any others from the original dataset.Summary of this step: You're creating a new dataset called disp
that contains three variables: start
(the value to format), label
(the formatted output), and fmtname
(the name of the format).
Sorting the Dataset:
proc sort data=disp nodupkey;
by start;
run;
proc sort data=disp nodupkey;
: This sorts the disp
dataset by the start
variable. The nodupkey
option removes any duplicate records based on the start
variable, ensuring that each value to be formatted is unique.by start;
: This specifies that the sorting should be done by the start
variable.Summary of this step: You're ensuring that the dataset disp
is sorted by the start
values and that there are no duplicate start
values.
What Does This All Mean?
Disp_Code
) to corresponding labels (like Disp_Code_Desc
), so that when you apply this format to a variable, the value gets displayed as the label.'$disposition'
is being created, which will map each Disp_Code
to its corresponding Disp_Code_Desc
.Final Step (Not in the Code)
Typically, after preparing this dataset (disp
), you would use the PROC FORMAT
procedure to actually create the format from this dataset:
proc format cntlin=disp;
run;
proc format cntlin=disp;
: This tells SAS to create a format based on the disp
dataset you've created.'$disposition'
format to display Disp_Code
values as their corresponding Disp_Code_Desc
labels in reports or datasets.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.