These Qs have likely been answered before... Formats are powerful, and can be used in so many ways.
1. Can I use them for value lists that are checked during input from a file in a datastep, and if so how? Eg, in this crude data step example:
if destination not in ('No Urgent Place','Home','Work') then do;
putlog 'QC_STOP: Bad value in var ' destination=;
stop;
end;
I'd like to replace that value list in parens in the IF statement by using a defined format for those values. Then, reuse that format later in various output plots, etc.
2. I'd like to change some values using formats as a lookup table. After checking a discrete list of hour values an input variable ('7AM','10AM','2PM','6PM','10PM') as in 1. above, I'd like to then change/map them to a newly added dataset variable representing military/24 hour as a corresponding number: 7,10,14,18,22.
I did search here and via google, but got so much related noise that I couldn't hone in on these two signals. Thanks, Peter
Hi,
I read your first questions to be about using formats for data validation. Yes, you can do this, and it's a great idea. Ron Cody wrote an incredible book about data cleaning that includes descriptions of this approach. https://support.sas.com/content/dam/SAS/support/en/books/codys-data-cleaning-techniques/70074_excerp....
You could make a format like:
proc format ;
value $checkdest
'No Urgent Place','Home','Work'='OK'
other='BAD'
;
run ;
Which you could use like:
data have ;
input destination $15.;
cards ;
Home
No Urgent Place
oops
Work
;
run ;
data want ;
set have ;
if put(destination,$checkdest.) ne 'OK' then do ;
putlog 'QC_STOP: Bad value in var ' destination=;
stop;
end ;
run ;
By the way, that sort of validation check is an assertion. You typically assert that a condition is true, and if it's not true, you write an error / abort / do something. So if you're doing a lot of this sort of checking, it's often easier to write an %assert macro, which you would use like:
data want ;
set have ;
%assert(put(destination,$checkdest.) = 'OK')
run ;
I wrote a paper with an %assert macro, which you could easily adapt: https://www.lexjansen.com/nesug/nesug12/cc/cc31.pdf.
If you want to check a lot of variables like this, you can create a format that defines the valid values for each variable, use arrays or macros or whatever to loop over variables and check each variable.
I don't really understand your second question, about time values. Typically it's best to leave time values as a SAS time (numeric value which is number of seconds since midnight) or a SAS date-time (number of seconds since midnight Jan 1, 1960). If you have a SAS time, you can use SAS formats to display the time in different formats.
HTH,
--Q.
Wow @Quentin, that is SO HELPFUL for #1, and is exactly what I'm looking for. The assert macro idea is exactly the design pattern I was going for (drawing on the 14 other programming languages and related design patterns I've learned previously). I'll update my code, and reply back if there are issues.
Re #2, formats seem to be able to lookup and convert one value (or object) to another (in the Smalltalk language, an equivalent is a Dictionary object). Ie, in the code you provided, you're looking up and converting
'No Urgent Place','Home','Work'='OK'
these values to 'OK', and other values to 'BAD'. In these case, they are all character values.
I want to do the same conceptually, however convert from character inputs ('7AM','10AM','2PM','6PM','10PM') to numbers (7,10,14,18,22) and place them in a new numeric variable. Of course, only after the character values have survived the assert macro after reading them in from the file/cards. Thankfully you've given me some ideas of things to try, and googling "SAS format as lookup table" now is more informative -- more signal and less noise. For example, https://support.sas.com/resources/papers/proceedings15/2219-2015.pdf "Table Lookup Techniques: From the Basics to the Innovative", and the USING FORMATS section.
Cody's book, the free first chapter available for download, is very helpful too. I have the book on order.
Thanks again, so much. I welcome any additional thoughts on #2, and I'll try some things. Thanks, again Peter
FORMATS convert values to text. You use them with FORMAT and PUT statements or the PUT(), PUTN() or PUTC() functions.
To create numbers you need to define an INFORMAT.
INFORMATS convert text to values. You use them with INFORMAT and INPUT statements or the INPUT(), INPUTN() or INPUTC() functions.
Numeric informats create numeric values. Character informats create character values.
Here is PROC FORMAT code to create an informat named TIMEX that will convert those strings into actual time values. If you want to create the integers 7,10 etc instead then change the values on the right of the equal signs.
proc format ;
invalue timex
'7AM' = '07:00't
'10AM'= '10:00't
'2PM' = '14:00't
'6PM' = '18:00't
'10PM'= '22:00't
;
run;
Hi,
The paper you found is by Art Carpenter, and you can't go wrong with anything he writes! BTW there is a SAS user who maintains an amazing index of SAS user group papers like that one. If you search lexjansen.com, you'll find lots of good stuff like that.
For the time issue, if you are given a file with character values and you want to convert them to a numeric time value, you would typically use an informat and an input function. While SAS ships with lots of informats, I don't see one that would handle values like '7AM', unfortunately. There is a time informat that can read values like '7:00AM' so if I were doing this, I would probably hack the data into that format, then use the time informat to read the value. Something like:
data have ;
input timechar $4. ;
cards ;
7AM
10AM
2PM
6PM
10PM
;
data want ;
set have ;
time_hacked=cats(substrn(timechar,1,anyalpha(timechar)-1),':00',substrn(timechar,anyalpha(timechar))) ;
time=input(time_hacked,time.) ;
run ;
proc print data=want ;
format time time. ;
run ;
proc print data=want ;
format time timeampm. ;
run ;
Note that one nice thing about the INPUT function is you can tell it it to ignore invalid values and return a null value, by using the ?? modifier:
data have ; input timechar $4. ; cards ; 7AM 10AM oops 2PM 6PM 10PM ; data want ; set have ; time_hacked=cats(substrn(timechar,1,anyalpha(timechar)-1),':00',substrn(timechar,anyalpha(timechar))) ; time=input(time_hacked,?? time.) ; run ; proc print data=want ; format time time. ; run ; proc print data=want ; format time timeampm. ; run ;
Resources such as the lexjansen.com site or books are so helpful, but often overwhelming to this beginner moving to intermediate programmer. Often, I don't even know how to frame the question or google search well, or have to make analogies to other concepts or design patterns in other languages that don't resonate. @Quentin and @Tom , you've demystified this area so much for me. Thank you! Peter
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.