BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
marina_esp
Obsidian | Level 7

 

Dear SAS-users,

 

I am very new to SAS and apologise for rather a basic question.

I wonder what is the difference between applying a new format using FORMAT and creating a new variable with the new format using PUT? 

Below is the example of custom-formatting ParcCode variable. I tried both - using the FORMAT and create new variable Region using  PUT. In the output table the both ParkCode and Region columns look identical, but the format of the Region is blank. 

Is there any crucial difference between the two approaches? I understand that I'll need PUT to transform numeric to character, but in this case ParkCode is already a character, so I apparently can just use FORMAT then?

 

data np_lookup;
	retain FmtName "$RegLbl";
    set pg2.np_codeLookup (rename=(ParkCode = Start));
    if Region ne " " then Label = Region;
    	else Label = "Unknown";
    	keep start label fmtname;
run;

proc format cntlin=np_lookup;
run;

data np_endanger;
    set pg2.np_species;
    where Conservation_Status='Endangered';
    Region = put(parkcode, $reglbl.); 
    format parkcode $reglbl.  ;
run;

 

I appreciate some tips and links to resources where I can find an answer!

 

Marina

Marina Espinasse,
statistician at the University hospital in Northern Norway
1 ACCEPTED SOLUTION

Accepted Solutions
marina_esp
Obsidian | Level 7

Hi @PaigeMiller  and @Jagadishkatam,

 

Thank you for your answers! You explained perfectly and now I understand the critical difference between the two approached.

 

Again, many thanks!

 

Marina

Marina Espinasse,
statistician at the University hospital in Northern Norway

View solution in original post

7 REPLIES 7
Jagadishkatam
Amethyst | Level 16

If we apply the format with put function (1) then the region will have the values as per the format. However if you apply the format via format statement (2) then values will be assign superficially only for the display but internally it will still have the original values in region as like parkcode.

 

if you use put function with format then there is no need for format statement.

 

    1) Region = put(parkcode, $reglbl.); 
    2) format parkcode $reglbl.  ;
Thanks,
Jag
PaigeMiller
Diamond | Level 26

In addition to what was said by @Jagadishkatam 

 

If the variable PARKCODE is NUMERIC, then the PUT statement changes it to character, and it contains the text as defined by the format. This could make a difference in some analyses.

 

Basically, using a FORMAT statement changes the appearance of the variable as us humans see it; it does not change the underlying value of the variable as SAS sees it. Using the PUT statement actually changes the values of a variable.

--
Paige Miller
marina_esp
Obsidian | Level 7

Hi @PaigeMiller  and @Jagadishkatam,

 

Thank you for your answers! You explained perfectly and now I understand the critical difference between the two approached.

 

Again, many thanks!

 

Marina

Marina Espinasse,
statistician at the University hospital in Northern Norway
FreelanceReinh
Jade | Level 19

Hello @marina_esp and welcome to the SAS Support Communities!

 

Formats are a powerful and very useful feature of SAS. As Jag and Paige have pointed out, a format doesn't change the internal value of a variable even if it is associated with the variable in a dataset. This has many advantages, e.g.:

  • You save resources, i.e., a lot of disk space and processing time, by storing only the short codes (regardless whether numeric or character) in the dataset and using the corresponding descriptive texts in format labels rather than separate, long character variables (whose values would possibly be repeated over and over again in a dataset).
  • Formats facilitate maintenance: It is easy to change a format label (e.g. "United States" --> "United States of America") in a format definition. You don't even need to touch the (possibly dozens of) datasets using the format to implement that change in your reports.
  • Formats provide flexibility because you can work with either the internal value (code) or the formatted value (be it from a permanently or temporarily associated format), just as needed.
    Examples:
    /* Create a format */
    
    proc format;
    value $code
    'N1'='United States'
    'N2'='Canada'
    'S1'='Brazil'
    'S2'='Argentina'
    /* ... */
    ;
    run;
    
    /* Create test data */
    
    data have;
    input id c :$2. x;
    format c $code.;
    cards;
    1 N2  50
    2 S1  20
    3 N1 100
    4 S2  70
    5 N1  60
    ;
    
    /* Select data based on the internal values */
    
    data south;
    set have;
    if c=:'S';
    run;
    
    /* Select data based on the formatted values */
    
    data cdn;
    set have(where=(put(c,$code.)='Canada'));
    run;
    
    /* Summarize data based on formatted values */
    
    proc means data=have sum;
    class c;
    var x;
    run;
    
    /* Summarize data based on formatted values
       using a different format "on the fly" */
    
    proc means data=have sum;
    class c;
    format c $1.;
    var x;
    run;
    
    /* Frequency table sorted by internal values */
    
    proc freq data=have;
    tables c;
    run;
    
    /* Frequency table sorted alphabetically by formatted values */
    
    proc freq data=have order=formatted;
    tables c;
    run;
marina_esp
Obsidian | Level 7

Hello Reinhard,

 

Thank you for explanations and examples - I tried them all. I see how FORMAT can be more useful than PUT.

 

Can I ask you a question about one of the examples you gave?

 

data cdn;
set have(where=(put(c,$code.)='Canada'));
run;

Here, I am not sure why do you have to use PUT when we already specified the FORMAT of column "c" when we were creating dataset "have"? Is that because FORMAT did not change the native format of values in "c"  , but only their apprearace in the reports, right?

 

Thank you for answering!

 

M

 

 

 

Marina Espinasse,
statistician at the University hospital in Northern Norway
FreelanceReinh
Jade | Level 19

@marina_esp wrote:

 

data cdn;
set have(where=(put(c,$code.)='Canada'));
run;

Here, I am not sure why do you have to use PUT when we already specified the FORMAT of column "c" when we were creating dataset "have"? Is that because FORMAT did not change the native format of values in "c"  , but only their apprearace in the reports, right?


You raise a very good point. It is indeed somewhat counterintuitive that the format name must be specified again in this WHERE condition although it is stored in dataset HAVE and SAS should be able to retrieve it from there. Of course, we need to specify somehow that we want to refer to the formatted value of variable C because, as you've noticed correctly, the format did not change the "internal" value of the variable, so that the expression C='Canada' would be false. (Note, however, how tempting it is to use this type of criterion if you saw the formatted values of C, e.g., in PROC PRINT output, without being aware that format $code. is associated with that variable!)

 

There are ways to refer to the formatted value without specifying the format: by using SAS functions of category "Variable Information," e.g., the VVALUE function:

data cdn;
set have;
if vvalue(c)='Canada';
run;

This is possibly a better example than the previous one. Unfortunately, these "V..." functions cannot be used in WHERE conditions.* That's why I switched to IF.

 

*The error message in this case is really funny:

76   data cdn;
77   set have;
78   where vvalue(c)='Canada';
ERROR: Function VVALUE is supported by the DATA STEP only.
79   run;

NOTE: The SAS System stopped processing this step because of errors.

As if this weren't a DATA STEP! 🙂

marina_esp
Obsidian | Level 7

Hey @FreelanceReinh,

 

Ok, I see. I have even tried to use c ="Canada " and got an empty data frame as an output.

 

Thank you for mentioning the VVALUE function, I've never heard about that, but will check it out.

 

Today is the first time I posted a question on the SAS community, and I learned so much.

Thanks to you and the other guys who replied me earlier today.

 

SAS online communities are great!

 

 

Marina Espinasse,
statistician at the University hospital in Northern Norway

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 4693 views
  • 4 likes
  • 4 in conversation