BookmarkSubscribeRSS Feed
mark4
Obsidian | Level 7

Does SAS have something similar to Python's dictionary?  For those unfamiliar, an easy way to think about it is a data set with two columns - name and value.  For example, the 'name' column could be a person's name, and the 'value' column could be their address.  One great feature about this is the ability to quickly 'look-up' the value for a given name.  If the dictionary was called dict, then to get John Doe's address, you evaluate the expression dict['John Doe'].  

 

You could get the same result with something like 

proc sql noprint;

select address into :Johns_Address from dict where name = "John Doe";

quit;

 

so calling &Johns_Address will return John Doe's address.  This is obviously clunky and less than ideal.  Are there shortcuts?  

 

 

17 REPLIES 17
Reeza
Super User
SAS Formats function like that and you can build them off a data set using CNTLIN. Then you use it, using a PUT or INPUT statement depending on the conversion direction.
mark4
Obsidian | Level 7

Thanks.  Can you explain a little more about what you mean and how it works?  I only know SAS Formats as a rather frustrating thing I have to deal with using date/time or converting between character and numeric, and I've never heard of CNTLIN.  

Kurt_Bremser
Super User

proc format does not only work from code, it can also create a format from a dataset (it's the cntlin= option in the proc format statement).

To see how such a dataset should look like, create one with the cntlout= option.

For a typical "lookup" format, you need just 4 variables:

  • start, the "from" value
  • label, the "to" value
  • type, either 'C' or 'N'
  • fmtname, the name of the format

 

Reeza
Super User

Here’s a good paper with examples and code:

 

http://www2.sas.com/proceedings/sugi30/001-30.pdf

 

And some of the technical concepts:

https://support.sas.com/resources/papers/proceedings12/048-2012.pdf

 

Your current references are built in SAS formats, but you can also create your own custom formats, User Defined Formats (UDF). 

It’s basically a lookup table and faster than joins. 

 

ballardw
Super User

@mark4 wrote:

Thanks.  Can you explain a little more about what you mean and how it works?  I only know SAS Formats as a rather frustrating thing I have to deal with using date/time or converting between character and numeric, and I've never heard of CNTLIN.  


 

Converting between may mean that you have a confusion between date, time and datetime FORMATS, which are used to display SAS numeric values in human readable form and INFORMATS which are used to translate human readable dates, times and datetimes into computer useable numeric forms.

 

Proc format allows you to create custom display formats or informats to read values into a variable.

CNTLIN is an option for Proc Format that will use specific variable names and values to create a format. CNTLOUT is an option to create a data set from Proc format code that can be used as a CNTLIN dataset.

 

SAS does have dictionary tables but they are different than you may be familiar with. The SAS dictionary tables keep track of SAS items such as library characteristics (path associated data engine and such), table characteristics (name, number of variables, number of records, library membership, variable names and characteristics and which data sets they belong to, defined macros, titles, footnotes and other system items.

mark4
Obsidian | Level 7

I'll have to look at the ctlin/ctlout options and see what they do.  (I don't even know what they're abbreviating.)  But a lot of my confusion comes from importing external data where dates are not formatted consistently.  At times they are text, sometimes they are displayed YYYYMMDD, other times YYYYJJJ where JJJ is the Julian date.  I usually just turn everything into numeric values and work from there.  Even this is not without problems since I cannot remember the difference in input vs put for converting between formats and the other 'format' syntax requirements (do I need a $ or a ., do I need some sort of length?).

ballardw
Super User

@mark4 wrote:

I'll have to look at the ctlin/ctlout options and see what they do.  (I don't even know what they're abbreviating.)  But a lot of my confusion comes from importing external data where dates are not formatted consistently.  At times they are text, sometimes they are displayed YYYYMMDD, other times YYYYJJJ where JJJ is the Julian date.  I usually just turn everything into numeric values and work from there.  Even this is not without problems since I cannot remember the difference in input vs put for converting between formats and the other 'format' syntax requirements (do I need a $ or a ., do I need some sort of length?).


 

Without seeing concrete examples I would guess that you are relying on Proc Import too much. Since Proc Import has to guess on variable characteristics based on a default very small number of records in a file imported this in not an uncommon issue, especially if you deal with marginally organized data such as from spreadsheets.

SAS will usually default to assigning a display format to match the format the data was read from. In which case all that would be needed is to assign you preferred date format after the data is read (Proc datasets would modify the data set and variable properties like Format or label in place).

mark4
Obsidian | Level 7

As I mention below, I receive a lot of data from SAS files created by others, and the issue is the inconsistency in how those dates are formatted.  I'll have to start using PROC DATASETS for these before reading them in, it appears.  But for those cases where I do have to import from csv or excel, I don't understand how PROC DATASETS helps, though. 

 

Is there not just a way to declare the variable types in a PROC IMPORT section, or the first time I go to use the data?  I find the large number of procedures you have to know to use SAS efficiently to be a hindrance.  I don't like cases where you have to break what should be a single step into multiple SAS steps (e.g., importing data according to user defined criteria like column names or variable types) - each step has it's own syntax and structure, and it makes it hard to keep track of how each works.  

Reeza
Super User

You're mixing up two concepts here, one is how SAS stores variables and formats them, and the second is how to import data correctly. 

 

Someone's already mentioned the import issue. 

 

Regarding variables and formats, Formats is actually one of the most powerful features of SAS in the long run. The presentation layer is separated from the data layer (which makes it a bit more abstract for the day to day users) but makes it much easier to manage data. I've started becoming more and more reliant on them after a decade of using SAS because they're versatile and quick. 

 

For dates, they are numbers, you can subtract, add, increment them as necessary. The formats simply change how they're displayed. Its especially nice when I get data sets from American sources that store data in a mmddyyyy format I can change that right away to what the rest of the world uses. 

 

Here's a tutorial on using dates and times in SAS:

https://communities.sas.com/t5/SAS-Communities-Library/Working-with-Dates-and-Times-in-SAS-Tutorial/...

 

 

The issues you're encountering are issues I face in all languages (Python and R included) - most import automatic procedures have issues and dates are stored as numbers in most languages, even Excel. The formats is a SAS specific issue. 

 

mark4
Obsidian | Level 7

Thanks. I think I have an understanding of how formats work, but I have never thought about it in depth so I may be wrong.  For a variable that's declared a date, SAS stores it as a numeric value representing the amount of time between your date and some arbitrary point in time.  Then declaring the format simply dictates how the variable is displayed, but it does not change the value stored in the data set, right?.  

 

I agree that dates are annoying no matter what you're using, and my annoyance isn't mixing up formatting and importing so much as just trying to manage inconsistent fields.  The trouble for me comes from having to remember the syntax for the various date functions (take Julian date to YYYYMMDD, vs take YYMMDD to YYYYMMDD, for example) in SAS and thinking about how that interacts with formats.  I receive sas files that contain fields that represent dates, but some are stored as dates, dates, some as character, others as numeric, and some of these will later have to be exported to excel using the same display format.  That's why I find it easiest to just convert it all a fixed 'format' but as a character, and if I need to do some arithmetic, treat it as numeric for the moment. 

Reeza
Super User
That’s a data integration problem in your process, not a SAS issue per se. You’d have the same issues in any other system, including Python AFAIK.
Ksharp
Super User

Hash Table/Map .

mark4
Obsidian | Level 7

Thanks.  I will certainly start using hashes.  The syntax and implementation is much cleaner, shorter, and easier to remember than using PROC FORMAT. 

 

Quickly browsing the available resources implies you can only use hashes in a data set, and once you're out of the data step, they are no longer available, though.  Are there any work-around's to this?  For just one example, I would love to create something like this for all data sets:  Each data set corresponds to a key, and the value from that key is an array consisting of various attributes for the data set like number of rows, number of columns, column names, etc.  Having this with the ability to print a key's value to the log would make testing/debugging/exploring new data sets much more efficient.  I wouldn't have to get information on data sets of interest by hunting around library views to click on the data set, and I wouldn't have to sift through far too much information printed to the Results tab when you use PROC CONTENTS.

Ksharp
Super User

Yeah. That is the benefit of FORMAT , which most people suggested.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 3719 views
  • 3 likes
  • 7 in conversation