BookmarkSubscribeRSS Feed
pspung
Fluorite | Level 6

These Qs have likely been answered before... Formats are powerful, and can be used in so many ways.

 

1. Can I use them for value lists that are checked during input from a file in a datastep, and if so how? Eg, in this crude data step example:

 

if destination not in ('No Urgent Place','Home','Work') then do;
   putlog 'QC_STOP: Bad value in var ' destination=;
   stop;

end;

 

I'd like to replace that value list in parens in the IF statement by using a defined format for those values. Then, reuse that format later in various output plots, etc.

 

2. I'd like to change some values using formats as a lookup table. After checking a discrete list of hour values an input variable ('7AM','10AM','2PM','6PM','10PM') as in 1. above, I'd like to then change/map them to a newly added dataset variable representing military/24 hour as a corresponding number: 7,10,14,18,22.

 

I did search here and via google, but got so much related noise that I couldn't hone in on these two signals. Thanks, Peter

7 REPLIES 7
Quentin
Super User

Hi,

 

I read your first questions to be about using formats for data validation.  Yes, you can do this, and it's a great idea.  Ron Cody wrote an incredible book about data cleaning that includes descriptions of this approach.  https://support.sas.com/content/dam/SAS/support/en/books/codys-data-cleaning-techniques/70074_excerp....

 

You could make a format like:

 

 

proc format ; 
  value $checkdest 
    'No Urgent Place','Home','Work'='OK'
    other='BAD'
  ;
run ;

Which you could use like:

 

 

data have ;
  input destination $15.;
  cards ;
Home
No Urgent Place
oops
Work
;
run ;

data want ;
  set have ;
  if put(destination,$checkdest.) ne 'OK' then do ;
     putlog 'QC_STOP: Bad value in var ' destination=;
     stop;
  end ;
run ;

By the way, that sort of validation check is an assertion. You typically assert that a condition is true, and if it's not true, you write an error / abort / do something.  So if you're doing a lot of this sort of checking, it's often easier to write an %assert macro, which you would use like:

 

 

data want ;
  set have ;
  %assert(put(destination,$checkdest.) = 'OK')
run ;

 

I wrote a paper with an %assert macro, which you could easily adapt: https://www.lexjansen.com/nesug/nesug12/cc/cc31.pdf.

 

If you want to check a lot of variables like this, you can create a format that defines the valid values for each variable, use arrays or macros or whatever to loop over variables and check each variable.

 

I don't really understand your second question, about time values.  Typically it's best to leave time values as a SAS time (numeric value which is number of seconds since midnight) or a SAS date-time (number of seconds since midnight Jan 1, 1960).  If you have a SAS time, you can use SAS formats to display the time in different formats. 

 

HTH,

--Q.

 

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
pspung
Fluorite | Level 6

Wow @Quentin, that is SO HELPFUL for #1, and is exactly what I'm looking for. The assert macro idea is exactly the design pattern I was going for (drawing on the 14 other programming languages and related design patterns I've learned previously). I'll update my code, and reply back if there are issues.

 

Re #2, formats seem to be able to lookup and convert one value (or object) to another (in the Smalltalk language, an equivalent is a Dictionary object). Ie, in the code you provided, you're looking up and converting 

'No Urgent Place','Home','Work'='OK'

these values to 'OK', and other values to 'BAD'. In these case, they are all character values.

 

I want to do the same conceptually, however convert from character inputs ('7AM','10AM','2PM','6PM','10PM') to numbers (7,10,14,18,22) and place them in a new numeric variable. Of course, only after the character values have survived the assert macro after reading them in from the file/cards. Thankfully you've given me some ideas of things to try, and googling "SAS format as lookup table" now is more informative -- more signal and less noise. For example, https://support.sas.com/resources/papers/proceedings15/2219-2015.pdf "Table Lookup Techniques: From the Basics to the Innovative", and the USING FORMATS section.

 

Cody's book, the free first chapter available for download, is very helpful too. I have the book on order.

 

Thanks again, so much. I welcome any additional thoughts on #2, and I'll try some things. Thanks, again Peter

Tom
Super User Tom
Super User

FORMATS convert values to text.  You use them with FORMAT and PUT statements or the PUT(), PUTN() or PUTC() functions.

To create numbers you need to define an INFORMAT. 

INFORMATS convert text to values.  You use them with INFORMAT and INPUT statements or the INPUT(), INPUTN() or INPUTC() functions.

Numeric informats create numeric values.  Character informats create character values.

Here is PROC FORMAT code to create an informat named TIMEX that will convert those strings into actual time values.  If you want to create the integers 7,10 etc instead then change the values on the right of the equal signs. 

proc format ;
  invalue timex 
    '7AM' = '07:00't
    '10AM'= '10:00't
    '2PM' = '14:00't
    '6PM' = '18:00't
    '10PM'= '22:00't
  ;
run;

 

pspung
Fluorite | Level 6
Thank you Tom for the great info and suggestions! I have so much to go on now. I'm going to write some more code... Peter
Quentin
Super User

Hi,

 

The paper you found is by Art Carpenter, and you can't go wrong with anything he writes!  BTW there is a SAS user who maintains an amazing index of SAS user group papers like that one.  If you search lexjansen.com, you'll find lots of good stuff like that.

 

For the time issue, if you are given a file with character values and you want to convert them to a numeric time value, you would typically use an informat and an input function. While SAS ships with lots of informats, I don't see one that would handle values like '7AM', unfortunately.  There is a time informat that can read values like '7:00AM' so if I were doing this, I would probably hack the data into that format, then use the time informat to read the value.  Something like:

 

data have ;
  input timechar $4. ;
  cards ;
7AM
10AM 
2PM
6PM
10PM
;

data want ;
  set have ;
  time_hacked=cats(substrn(timechar,1,anyalpha(timechar)-1),':00',substrn(timechar,anyalpha(timechar))) ;
  time=input(time_hacked,time.) ;
run ;

proc print data=want ;
  format time time. ;
run ;

proc print data=want ;
  format time timeampm. ;
run ;

Note that one nice thing about the INPUT function is you can tell it it to ignore invalid values and return a null value, by using the ?? modifier:

 

data have ;
  input timechar $4. ;
  cards ;
7AM
10AM 
oops
2PM
6PM
10PM
;

data want ;
  set have ;
  time_hacked=cats(substrn(timechar,1,anyalpha(timechar)-1),':00',substrn(timechar,anyalpha(timechar))) ;
  time=input(time_hacked,?? time.) ;  
run ;

proc print data=want ;
  format time time. ;
run ;

proc print data=want ;
  format time timeampm. ;
run ;
BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
pspung
Fluorite | Level 6
Thank you again Quentin for the great info and advice. I'm going to put this in practice, and write some code. Btw and fwiw, this is the dataset we're examining with SAS: https://archive.ics.uci.edu/ml/datasets/in-vehicle+coupon+recommendation

Thanks again, Peter
pspung
Fluorite | Level 6

Resources such as the lexjansen.com site or books are so helpful, but often overwhelming to this beginner moving to intermediate programmer. Often, I don't even know how to frame the question or google search well, or have to make analogies to other concepts or design patterns in other languages that don't resonate. @Quentin and @Tom , you've demystified this area so much for me. Thank you!  Peter

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2391 views
  • 3 likes
  • 3 in conversation