About louisehadden

louisehadden · ‎07-10-2025

Yay! Can’t wait to view new goodies / documentation

louisehadden · ‎06-10-2025

The Muggles to Macros presentations provide valuable content to the core of the SAS community: SAS users. Well-explained IRL examples and amusing tie-ins to Harry Potter makes this content a must-see! Great job by the presenters.

louisehadden · ‎07-02-2024

Hi there. Many SAS installs may not have particular fonts available in the SAS registry. The way to make this happen is to use PROC FONTREG with the FONTPATH statement, after downloading the Arial font family into the FONTPATH directory. Note that your install may not allow you to mess with the registry and/or system fonts, so use your own folder to add the desired font for the duration of your session - the fontpath allows you to grab a font from anywhere. Be aware that this is temporary! You will want to make sure the fontembedding system option is set. options fontembedding ps=55 ls=175 errorabend; libname dd '.'; filename odsout '.'; run; title1 "HPOG rtf test"; run; proc fontreg msglevel=verbose; fontpath 'S:\projects\QRPVBP\HH_QRP\_Source_Data\iQIES\OASIS\2023Q1_2023Q4\sandbox'; run; NOTE: The font "Arial Narrow" (Style: Regular, Weight: Normal) has been added to the SAS Registry at [CORE\PRINTING\FREETYPE\FONTS\<ttf> Arial Narrow]. Because it is a TRUETYPE font, it can be referenced as "Arial Narrow" or "<ttf> Arial Narrow" in SAS. The font resides in file "S:\projects\QRPVBP\HH_QRP\_Source_Data\iQIES\OASIS\2023Q1_2023Q4\sandbox\ARIALN.TTF". NOTE: The font "Arial Narrow" (Style: Regular, Weight: Bold) has been added to the SAS Registry at [CORE\PRINTING\FREETYPE\FONTS\<ttf> Arial Narrow]. Because it is a TRUETYPE font, it can be referenced as "Arial Narrow" or "<ttf> Arial Narrow" in SAS. The font resides in file "S:\projects\QRPVBP\HH_QRP\_Source_Data\iQIES\OASIS\2023Q1_2023Q4\sandbox\ARIALNB.TTF". NOTE: The font "Arial Narrow" (Style: Italic, Weight: Bold) has been added to the SAS Registry at [CORE\PRINTING\FREETYPE\FONTS\<ttf> Arial Narrow]. Because it is a TRUETYPE font, it can be referenced as "Arial Narrow" or "<ttf> Arial Narrow" in SAS. The font resides in file "S:\projects\QRPVBP\HH_QRP\_Source_Data\iQIES\OASIS\2023Q1_2023Q4\sandbox\ARIALNBI.TTF". NOTE: The font "Arial Narrow" (Style: Italic, Weight: Normal) has been added to the SAS Registry at [CORE\PRINTING\FREETYPE\FONTS\<ttf> Arial Narrow]. Because it is a TRUETYPE font, it can be referenced as "Arial Narrow" or "<ttf> Arial Narrow" in SAS. The font resides in file "S:\projects\QRPVBP\HH_QRP\_Source_Data\iQIES\OASIS\2023Q1_2023Q4\sandbox\ARIALNI.TTF". Then add the font group. proc fontreg; fontfile '<ttf> Arial Narrow'; run; data SimpleTest; input treatment trainingMonths_LO; cards; 1 13 0 12 1 10 0 3 ; run; ods rtf file='.\vanillaTest.rtf'; /* will use the default style template for RTF */ proc report data=SimpleTest style(report)=[fontfamily="'Arial Narrow'"] style(header)=[fontfamily="'Arial Narrow'"] style(column)=[fontfamily="'Arial Narrow'"]; column treatment (Mean Std),trainingMonths_LO; define treatment/group; define trainingMonths_LO/format=6.2; title2 'Months of training - Vanilla'; run; ods rtf close; ods rtf file='.\defineTest.rtf'; /* adds font specs to the define statements */ proc report data=SimpleTest style(report)=[fontfamily="'Arial Narrow'"] style(header)=[fontfamily="'Arial Narrow'"] style(column)=[fontfamily="'Arial Narrow'"]; column treatment (Mean Std),trainingMonths_LO; define treatment / group style(COLUMN)={just=c font_face="Arial Narrow" foreground=navy font_size=9pt cellwidth=180 } style(HEADER)={just=c font_face="Arial Narrow" font_weight=bold font_size=9pt }; define trainingMonths_LO / style(COLUMN)={just=c font_face="Arial Narrow" foreground=navy font_size=9pt cellwidth=180 format=6.2} style(HEADER)={just=c font_face="Arial Narrow" font_weight=bold font_size=9pt }; title2 'Months of training - add style override'; run; ods rtf close; Both tests that use the style statements / overrides succeed in using the Arial Narrow font. PROC FONTREG is pretty cool - especially if you want to write in Klingon font. https://www.lexjansen.com/nesug/nesug07/po/po12.pdf

louisehadden · ‎10-29-2023

I know that ODS EXCEL does not do preimage and postimage in titles and footnotes, and suspect that ODS WORD may be the same. Can anyone confirm and suggest a work around that isn't a Unicode font? TIA.

louisehadden · ‎07-26-2022

A number of user groups in the US, including WUSS, sponsor virtual sessions, primarily because of the pandemic but also to reach out to a broader audience. I’ve presented virtually at WUSS, SESUG, RTSUG, SGF, and Michigan SUG, and I know there are others., including in Europe. SAS Explore is also virtual. This is a fantastic opportunity for everyone!

louisehadden · ‎05-27-2022

Awesome product and suggestion

louisehadden · ‎02-01-2022

Thanks for the tip about the Boston meet-up, Chris.

louisehadden · ‎01-19-2022

Interestingly, after trying the access export with SAS Studio, EG, Display Manager and batch SAS, getting different error messages with all 4 systems, the problem turned out to be an overlong path name, and had nothing to do with PROC EXPORT and Office 365 (except that perhaps SAS or MS Excel suddenly got picky about the length after many years of running the same program with the same path name, or that maybe our servers naming conventions changed slightly - I'm looking into that). I will put together a package for Tech Support because it really was interesting that the error messages in the logs did not ever indicate what the problem really was! A fun little debugging project for a cold winter's day.

louisehadden · ‎01-12-2022

Thanks, Chris, I will check out the ODBC connections. We have a set up with Windows x64 Amazon servers (so data in the cloud), and previously had non-cloud 64 bit Microsoft products in place. We just need to figure out how to make the connections work in a monthly process where the output file name will differ each month. You've given me some good clues toward making this process "push button" again.

louisehadden · ‎01-11-2022

Hi, all. We have been running a (somewhat antiquated) program to create an ACCESS db with multiple tables uneventfully for many years. Our company migrated our servers to Office 365 instead of standalone copies of Access, Word, etc. recently and the program no longer works. We get a connection error. We've determined that it is a connection error. Does anyone have any handy tips on (1) setting up an OBDC connection for SAS->MS Access 365 and (2) any tweaks required to the PROC EXPORT code? Any other suggestions? This file is generated every month and we definitely don't have time for tweaking for this month's client delivery - we know we need to modernize in the near future but need a quick fix for now. Any suggestions on how to fill in the blanks programmatically (i.e. syntax for the OBDC connection required) welcome! Thanks in advance!

louisehadden · ‎03-15-2021

Paper 1189-2021 Authors Richann Watson, DataRich Consulting; Louise Hadden, Abt Associates Abstract SAS® practitioners are frequently called upon to do a comparison of data between two different data sets and find that the values in synonymous fields do not line up exactly. A second quandary occurs when there is one data source to search for particular values, but those values are contained in character fields in which the values can be represented in myriad different ways. This presentation discusses robust, if not warm and fuzzy, techniques for comparing data between, and selecting data in, SAS data sets in not so ideal conditions. SAS has provided a number of tools which can perform fuzzy matching. Among these tools are wild card searches in a where statement using the LIKE (alias ?) and CONTAINS operators; string searches in an if statement using operators and functions; fuzzy comparison functions such as COMPGED and SPEDIS; PROC FCMP; PROC GEOCODE, and data driven control tables enabling user-defined formats to standardize data. The purpose of this presentation is to introduce, by example, each of these tools and techniques. InTRODUCTIOn SAS has provided a number of tools which can perform “fuzzy matching”. Among these tools are “wild card” searches in a where statement using the LIKE (alias ?) and CONTAINS operators; string searches in an if statement using operators and functions; “fuzzy” comparison functions such as COMPGED and SPEDIS; PROC FCMP; PROC GEOCODE, and data driven control tables enabling user-defined formats to standardize data. The purpose of this paper is to introduce, by example, each of these tools and techniques. We urge you to look at our paper for more complete information - this article will just contain "teasers". Sample Data i Below is a table containing sample data for most of our fuzzy function examples in the paper. In the sections below, we'll present a single example of selected tools and techniques. Please see our paper for more exhaustive details on everything you ever wanted to know about fuzzy functions! FIRSTNAME LASTNAME GENDER SCORE AGE EMAIL Jan Write F 1.00000000000012 44.9999999999990 Jan_Write@mail.org Lucy Smyth F 1.00000000000121 53.9383983572895 Lucy_Smyth@mail.org Kris Johnson F 1.00324325660746 39.3566050650240 Kris_Johnson@mail.org Chris Jones M 0.00000000000121 48.8268309377139 Chris_Jones@mail.org Tracey Smith F 0.00000000000012 46.4499999999991 Tracey_Smith@mail.org Tracy Besley M 0.00324325660746 47.4999999999990 Tracy_Besley@mail.org Tracie Smith-jones F -2.00000000000921 35.6659822039699 Tracie_Smith-jones@mail.org Chrys Jones-Wright F -2.00000000000092 34.1546885694730 Chrys_Jones-Wright@mail.org Jon Wright M 1.00000000000038 46.8145106091718 Jon_Wright@mail.org John Hall M 1.00000000000384 42.9949999999900 John_Hall@mail.org Timothy Bones M -12.00000000000920 48.3504449007529 Timothy_Bones@mail.org Jason Jaones M -12.00000000000091 48.7118412046543 Jason_Jaones@mail.org Tyler ones M -5.00413456081936 42.9499999999990 Tyler_ones@mail.org WHERE STATEMENT WILD CARD TECHNIQUES There are times when you may need to subset records in a data file but specifying the full search string may be an arduous and error prone process, or there may be multiple search strings that have a similar, identifiable pattern. You could default to the standard logic of VAR in (‘FULL VALUE1’ ‘FULL VALUE2’ etc.), or, you can use the following “fuzzy” techniques. CONTAINS or ? Operators If you are using a sub-setting WHERE statement you can use a question mark (?) or the word CONTAINS instead of an equal (=). The CONTAINS or ? will allow look for records that have values that contain what is specified. Using the ROSTER data set shown above for people that have ‘Jones’ in their last name, the CONTAINS or ? operators can minimize the chance of error by searching for any record that contains the word ‘Jones’ in the variable LASTNAME. data jones_1; set roster; where LASTNAME ? 'Jones'; run; data jones_2; set roster; where LASTNAME contains 'Jones'; run; Regardless of which operator is used they will both yield the same results. FIRSTNAME LASTNAME GENDER Chris Jones M Chrys Jones-Wright F It is important to note that the CONTAINS or ? operators are case sensitive, thus if casing should be ignored, then it should be used in conjunction with the LOWCASE function or UPCASE function to force the variable to be one case. SEARching for a string with if statements There are several different SAS functions-based options that can be explored when using IF statements for which wild card techniques are not applicable. These “fuzzy” search techniques are explored below. INDEX Function If you need to search for a string anywhere within a variable or another string, then the INDEX function could be utilized. With the INDEX function you would need to provide two arguments: the variable or string to be searched and the string that you are searching for. Syntax: INDEX(source, excerpt) data jones_8a; set roster; if index(LASTNAME, 'Jones'); run; FIRSTNAME LASTNAME GENDER Chris Jones M Chrys Jones-Wright F The display above shows a subset of the ROSTER data file containing last names of Jones using the INDEX function. The downside of using INDEX is that it is case sensitive. If you need to look for values regardless of case status, you could use the UPCASE function or LOWCASE function to force the source string to be one case and then specify the excerpt string to be the same case. The program and display below show that LASTNAME = ‘Smith-jones’ was missed in the initial program execution because ‘jones’ was not proper case. data jones_8b; set roster; if index(upcase(LASTNAME),'JONES'); run; FIRSTNAME LASTNAME GENDER Chris Jones M Tracie Smith-jones F Chrys Jones-Wright F CHARACTER fuzzy comparisons Character strings, especially strings describing names and addresses, are notoriously dirty and prone to spacing, length and punctuation issues. Any real-world comparison of character strings or selection based on character strings needs to be both flexible and configurable, i.e. the degree of “sameness” needs to be quantifiable. SAS provides several character functions that allow you to make a fuzzy comparison: COMPARE, COMPGED, COMPLEV, SOUNDEX and SPEDIS. Each of these functions use a different fuzzy algorithm and can be used in conjunction with one another to achieve a (subjectively) optimal match. Use of these functions produces inexact results by definition, and results must be reviewed carefully. We discuss the SOUNDEX function below. SOUNDEX Function The SOUNDEX function determines how much two character variables sound alike. It works best with the English language. It is equivalent to using =* (sounds like) on a WHERE statement. Syntax: SOUNDEX(argument) With the SOUNDEX function vowels and the letters ‘H’, ‘W’ and ‘Y’ are excluded except when it is the first character in the argument when determining if the argument sounds like a specific value. Other characters in the English alphabet are assigned one of the following values: B, F, P, V -) 1 C, G, J, K, Q, S, X, Z -) 2 D, T -) 3 L -) 4 M, N -) 5 R -) 6 The value generated from SOUNDEX is the first character in the argument and then for each character in the argument that is not excluded is assigned one of the values above. If there are two or more consecutive characters assigned the same numeric value, then only the first one is kept. To demonstrate the use of SOUNDEX, we execute the following data step so that we can calculate the value generated from SOUNDEX using FIRSTNAME. data roster_soundex; set roster; FN_SOUND = soundex(FIRSTNAME); run; FIRSTNAME LASTNAME FN_SOUND Jan Write J5 Lucy Smyth L2 Kris Johnson K62 Chris Jones C62 Tracey Smith T62 Tracy Besley T62 Tracie Smith-jones T62 Chrys Jones-Wright C62 Jon Wright J5 John Hall J5 Timothy Bones T53 Jason Jaones J25 Tyler ones T46 For rows 4 and 8, we see that both yield a value of ‘C62’. This is because both started with ‘C’ and the ‘h’, ‘y’ and ‘i’ were discarded, leaving only ‘r’ and ‘s’. The ‘r’ was assigned a value of ‘6’ and ‘s’ was assigned a value of ‘2’. Notice that if the third row would have started with a ‘C’ instead of a ‘K’ it would have resulted in the same value. However, since it is started with a ‘K’, the result was ‘K62’. For the rows 5-7, we see that the value was ‘T62’ and this was because we discarded all the vowels and ‘y’ after the first character, leaving only ‘r’ and ‘c’. For rows 9 and 10, the ‘o’ and ‘h’ were discarded leaving only the ‘n’ after the first argument, resulting in a value of ‘J5’. NUMERIC FUZZY COMPARISONS All the techniques illustrated thus far have been dealing with character strings, but what if we have a numeric value? There are various options available for determining if a numeric value is equivalent to another numeric value. In some cases, the values will be exactly equal, and no additional comparison is needed. However, there are some cases where the values are not quite equal but are equal ‘enough’ so that if the values were rounded or truncated or a ‘fuzz’ factor is added then the values would be considered equal. Depending on the type of ‘fuzz’ factor you wish to consider will determine which function should be best utilized. Using the ROSTER data in Data Display 1, we will illustrate several numeric ‘fuzzy’ functions. CEIL and CEILZ Functions The CEIL function rounds UP to the nearest smallest integer that is greater than or equal to the argument, that is it will return an integer value that is greater than or equal to the argument. It uses fuzzing in order to avoid issues with floating points. If the result returned from the CEIL function is with 1E-12 of the argument, then the value is considered equal to the integer portion of the argument. Syntax: CEIL(argument) However, if you do not want to consider any fuzzing when rounding up to the nearest integer, then CEILZ is the function that should be used. CEILZ works the same as CEIL but it does not use fuzzing. Therefore, even if the return value is within 1E-12 of the argument it will round up to the nearest smallest integer instead of considering the value equal to the integer portion of the argument. Syntax: CEILZ(argument) data fuzz_score; set roster; S_CEIL = ceil(SCORE); S_CEILZ = ceilz(SCORE); run; FIRSTNAME LASTNAME GENDER SCORE S_CEIL S_CEILZ Jan Write F 1.00000000000012 1 2 Lucy Smyth F 1.00000000000121 2 2 Kris Johnson F 1.00324325660746 2 2 Chris Jones M 0.00000000000121 1 1 Tracey Smith F 0.00000000000012 0 1 Tracy Besley M 0.00324325660746 1 1 Tracie Smith-jones F -2.00000000000921 -2 -2 Chrys Jones-Wright F -2.00000000000092 -2 -2 Jon Wright M 1.00000000000038 1 2 John Hall M 1.00000000000384 2 2 Timothy Bones M -12.00000000000920 -12 -12 Jason Jaones M -12.00000000000091 -12 -12 Tyler ones M -5.99999999999999 -6 -5 Notice that in rows 1, 5 and 9 CEIL returns the integer portion of the SCORE since the return values were within 1E-12 of the original argument. However, for these same records CEILZ rounds up to the nearest smallest integer because there was zero fuzzing allowed. In the last row the argument was within 1E-12 of -6, so CEIL considers these equivalent and therefore returns the value of -6, but with CEILZ returned the smallest integer that was greater than the argument, which is -5. ADDRESS CHECKING WITH SAS Address matching is a task for which fuzzy matching techniques are frequently used. It is an example of “phrase matching”, where there are multiple words in a phrase that need to match in order for two phrases to be considered equal. Consider the table below, a partial printout of selected street types from SASHELP.GCTYPE, in which there are a number of variations in street types from across the world. NAME TYPE GROUP AV AVE 12 AVE AVE 12 AVEN AVE 12 AVENIDA AVE 12 AVENU AVE 12 AVENUE AVE 12 AVN AVE 12 AVNUE AVE 12 BELT BELT 16 BELTWAY BELT 16 BL BLVD 34 BLVD BLVD 34 BOUL BLVD 34 BOULEVARD BLVD 34 BOULV BLVD 34 BTWY BELT 16 CIR CIR 64 CIRC CIR 64 CIRCL CIR 64 CIRCLE CIR 64 CIRCULO CIR 64 CRCL CIR 64 CRCLE CIR 64 CÍR CIR 64 CÍRCLE CIR 64 As you can see, Avenue can be spelled a number of ways. SAS supplies this look-up table for PROC GEOCODE, discussed below. In a file or files of addresses, such variations in the street type spelling are just the tip of the iceberg in terms of the vast panoply of “dirty data”. SAS uses this look-up table, and others, in performing fuzzy matches for street addresses (and other geographic entities such as county, congressional districts, etc.) and produces standardized addresses. In addition, we’ll discuss another SAS tool for fuzzy address matching, creating our own fuzzy function to perform the normalizing of street types below. SAMPLE DATA FOR PROC GEOCODE PROVNUM ADDRESS CITY STATE ZIP 105205 2121 E COMMERCIAL BLVD FORT LAUDERDALE FL 33308 106088 4650 STATE RD 16 SAINT AUGUSTINE FL 32092 146035 2259 EAST 1100TH STREET MENDON IL 62351 175446 915 MCNAIR STREET HALSTEAD KS 67056 175549 12340 QUIVIRA ROAD OVERLAND PARK KS 66213 245395 965 MCMILLAN STREET WORTHINGTON MN 56187 385263 970 W JUNIPER AVENUE HERMISTON OR 97838 525362 719 E CATHERINE ST BOX 167 DARLINGTON WI 53530 525462 245 SYCAMORE ST SAUK CITY WI 53583 676397 23450 PINE SHADOW LN PORTER TX 77365 PROC GEOCODE method=STREET data=prov out=dd.GEOCODED lookupstreet=street.usm type=SASHELP.GCTYPE; run; The results from running the data through PROC GEOCODE show that the variations for ‘street’ were all converted to ‘St’ and the rows with ‘road’ were changed to ‘Rd’. For some addresses, there may be situations where there are two addresses tied to a particular location. GEOCODE will “normalize” the addresses to the actual physical addresses for that location found in the look up file which are used by the USPS. Before: OBS ADDRESS CITY STATE ZIP 1 2121 E COMMERCIAL BLVD FORT LAUDERDALE FL 33308 2 4650 STATE RD 16 SAINT AUGUSTINE FL 32092 3 2259 EAST 1100TH STREET MENDON IL 62351 4 915 MCNAIR STREET HALSTEAD KS 67056 5 12340 QUIVIRA ROAD OVERLAND PARK KS 66213 6 965 MCMILLAN STREET WORTHINGTON MN 56187 7 970 W JUNIPER AVENUE HERMISTON OR 97838 8 719 E CATHERINE ST BOX 167 DARLINGTON WI 53530 9 245 SYCAMORE ST SAUK CITY WI 53583 10 23450 PINE SHADOW LN PORTER TX 77365 After: OBS M_ADDR M_CITY M_STATE M_ZIP 1 2121 E Commercial Blvd Fort Lauderdale FL 33308 2 4650 State Rd 16 Green Cove Springs FL 32092 3 2237 E 1100th St Mendon IL 62351 4 915 McNair St Halstead KS 67056 5 12340 Quivira Rd Overland Park KS 66213 6 965 McMillan St Worthington MN 56187 7 970 W Juniper Ave Hermiston OR 97838 8 8374 Co Rd E Darlington WI 53530 9 245 Sycamore St Sauk City WI 53583 10 24200 Pine Cir Porter TX 77365 PROC FCMP - CREATE YOUR OWN FUZZ FUNCTION Many of the fuzzy matching techniques discussed above are case sensitive – so that frequently variables representing patterns need to be standardized with regard to case and punctuation. A full discussion of PROC FCMP is beyond the scope of this paper, but we will briefly discuss a user-defined function that can be helpful when performing fuzzy matching (and elsewhere). Use of a format library entry to identify non-standard street name terminology is a helpful tool – and the format can “learn” by including new variations as they are found. One method of using the results of the learned translations is to write a function incorporating the translations, as well as standardizing case, etc. A simplistic example follows below, in which street types are standardized prior to going into a fuzzy matching routine. As with the format, the function can be informed by new variations uncovered. In addition, the function performs such tasks as standardizing case, left justifying, and trimming. proc fcmp outlib=work.funcs.address; function streets(addr $) $; length clean_address standardized_address $100; clean_address=upcase(addr); clean_address=left(trim(clean_address)); clean_address=tranwrd(clean_address,' STREET ',' ST '); clean_address=tranwrd(clean_address,'ROAD','RD'); clean_address=tranwrd(clean_address,'BOULEVARD','BLVD'); clean_address=tranwrd(clean_address,'AVENUE','AVE'); clean_address=tranwrd(clean_address,' DRIVE ',' DR '); clean_address=tranwrd(clean_address,'PLACE','PL'); clean_address=tranwrd(clean_address,'LANE','LN'); clean_address=tranwrd(clean_address,'CIRCLE','CIR'); clean_address=tranwrd(clean_address,'COURT ','CT '); clean_address=tranwrd(clean_address,'PARKWAY','PKWY'); standardized_address=clean_address; return(standardized_address); endsub; quit; SAMPLE DATA FOR PROC FCMP FIRSTNAME LASTNAME ADDRESS Jan Write 1234 Any Place, Anywhere, NC 12345 Lucy Smyth 5673 MyBlock Drive, Myhome, TX 79732 Kris Johnson 19752 Home Blvd, Home, MA 03321 Chris Jones 98 NewTown Circle, Newtown, OK 31313 Tracey Smith 1294-13 Johnson Lane, Nowhere, MN 23213 Once a user-built function is created the options CMPLIB needs to point to the location of where the user-built function resides. The program snippet below illustrates the use of the option as well as implementing the function. After the execution of the program the addresses are cleaned so that there is consistency. options cmplib=(work.funcs); data roster3; set roster2; cleaned_address = streets(address); run; FIRSTNAME LASTNAME CLEANED_ADDRESS Jan Write 1234 ANY PL, ANYWHERE, NC 12345 Lucy Smyth 5673 MYBLOCK DRIVE, MYHOME, TX 79732 Kris Johnson 19752 HOME BLVD, HOME, MA 03321 Chris Jones 98 NEWTOWN CIR, NEWTOWN, OK 31313 Tracey Smith 1294-13 JOHNSON LN, NOWHERE, MN 23213 Conclusion SAS has provided a myriad of tools to utilize for “fuzzy” matching. Selection of records with where statements (conditions and special operators) and if statements (:_ operator and functions); standardizing of records using fuzzy matching techniques including user defined formats, functions, and PROC GEOCODE (address information); and PROC FCMP (a user-defined function to clean addresses) are all discussed in the full paper attached. We hope you’ve gained some appreciation for the “fuzz” and you’ll get “fuzzy” along with us!

louisehadden · ‎09-18-2020

This article will demonstrate how to use good documentation practices and SAS® to easily produce attractive, camera-ready data codebooks (and accompanying materials such as label statements, format assignment statements, etc.) Four primary steps in the codebook production process will be explored: use of SAS metadata to produce a master documentation spreadsheet for a file; review and modification of the master documentation spreadsheet; import and manipulation of the metadata in the master documentation spreadsheet to self-generate code to be included to generate a codebook; and use of the documentation metadata to self-generate other helpful code such as label statements. Full code for the example shown (using the SASHELP.HEART data base) is attached. The most onerous task any SAS programming professional faces is to accurately document files and processes. The truth is that there are no easy answers to the documentation quandary. It takes hard, painstaking work! By setting careful standards at the outset of a programming task, documenting your processes, labelling your data files and variables, providing value labels (formats) for your variables when appropriate, and using the many tools the SAS® system provides to assist in the documentation process, producing codebooks can be a piece of cake. You’ve done a lot of hard work documenting every aspect of your programming project, and now it is time to reap your rewards. There are a number of ways that you can present information from PROC CONTENTS and PROC DATASETS covered in many other papers, including some of my own. We are going to focus on the use of an intermediate spreadsheet to drive creation of a robust codebook with selfgenerating code. STEP 1 It is important to review and evaluate the metadata associated with the data set to be documented. Data sets should be labeled accurately. Variables should be labelled accurately. If variables have informats or formats, that information should be available and accurate. There should be a program available to create a permanent format library with a two level catalog name, if applicable – and those formats should be accurate. For our example, we create an age category variable that we wish to format, and write a program to generate a format in a permanent, two-level format catalog. Code snippet from 1gen_formats_PHARMASUG_2017_QT07.sas (in zip file attached to this article): TITLE1 "PHARMASUG 2017 QT07"; FOOTNOTE1 "%SYSFUNC(GETOPTION(SYSIN)) - &SYSDATE - &SYSTIME - run by &SYSUSERID in &SYSPROCESSMODE"; RUN; LIBNAME dd '.'; LIBNAME library '.'; FILENAME odsout '.'; RUN; PROC FORMAT LIBRARY=LIBRARY.HEART; VALUE startage 25 - 34='25 to 34 years' 35 - 44='35 to 44 years' PROC DOC III: Self-generating Codebooks Using SAS®, continued 2 45 - 54='45 to 55 years' 55 - 64='55 to 64 years'; VALUE agefmt 1='25 to 34 years' 2='35 to 44 years' 3='45 to 54 years' 4='55 to 64 years'; RUN; STEP 2 In the example shown below, a Microsoft Excel® spreadsheet with selected variables from PROC CONTENTS output is generated using PROC EXPORT in program 2gen_metadata_PHARMASUG_2017_QT07.sas. I am using a modified copy of SASHELP.HEART as the sample data set for several reasons, one of which is that not all variables are labelled, requiring some changes. Another reason is that this data set is available to all SAS users. Code snippet from gen_metadata_PHARMASUG_2017_QT07.sas: DATA dd.heart (LABEL="Copy of SASHELP.HEART for PHARMASUG 2017 QT07- created by %SYSFUNC(GETOPTION(SYSIN)) - &SYSDATE - &SYSTIME - run by &SYSUSERID in &SYSPROCESSMODE"); LENGTH dslabel $ 200 source $ 32; SET sashelp.heart; /* put in some missing labels */ dslabel="Copy of SASHELP.HEART for PHARMASUG 2017 QT07- created by %SYSFUNC(GETOPTION(SYSIN)) - &SYSDATE - &SYSTIME - RUN by &SYSUSERID in &SYSPROCESSMODE"; source="&dsname"; IF 25 LE ageatstart LE 34 THEN age=1; IF 35 LE ageatstart LE 44 THEN age=2; IF 45 LE ageatstart LE 54 THEN age=3; IF 55 LE ageatstart LE 64 THEN age=4; IF ageatstart ge 85 THEN age=7; FORMAT age agefmt.; LABEL cholesterol='Cholesterol level' diastolic='Diastolic blood pressure' height='Height' sex='Gender' smoking='Cigarettes per day' status='Wanted, dead or alive' systolic='Systolic blood pressure' weight='Weight' source='Data set name' dslabel='Data set information' age='Age at Start Category' ; RUN; . . . PROC EXPORT DATA = dd.heart_cb DBMS = excel OUTFILE = ".\heart_db.xlsx" REPLACE; RUN; Of course, you want to review the results of your spreadsheet creation in Excel and maybe modify a label or format assignment. Note that I have created a variable / column indicating a specialized variable type (VARTYPE), as I want to treat formatted variables differently from unformatted variables in the codebook. You can then reimport the modified spreadsheet for use in the next step to: (a) write code to be included to generate a codebook with output varying by variable type; (b) write code to generate a label statement; and (c) write code to generate a format assignment statement, among other normally onerous tasks. STEP 3 The codebook generation program, 3_gen_codebook_PHARMASUG_2017_QT07.sas, starts with reimporting the edited version of the metadata spreadsheet, shown above. A number of macros are then constructed: to report on “header information” (i.e. variable name, label, etc.), missing values, and then details on non-missing values, differential by variable type (character, continuous, categorical). Additionally, the program accesses the metadata and outputs text files with macro calls to the macros created above conditional upon the variable type in the metadata and reporting macros, that are then reused in the program as include files. Code snippet from 3gen_codebook_PHARMASUG_2017_QT07.sas: DATA _null_; FILE out1 LRECL=80 PAD; LENGTH include_string $ 80; SET dd.heart_cb (KEEP=varnum name vartype); include_string=CATS('%header(',name,",",varnum,");"); PUT include_string; RUN; . . . DATA _null_; FILE out4 LRECL=80 PAD; LENGTH include_string $ 80; SET dd.heart_cb (KEEP=varnum name vartype); IF vartype=1 THEN include_string=CATS('%printtable(',varnum,");"); IF vartype=2 THEN include_string=CATS('%printtablec(',varnum,");"); IF vartype=3 THEN include_string=CATS('%printblurb(',varnum,");"); PUT include_string; RUN; Macros are written to report on each variable, creating an RTF codebook. These printing macros are utilized in the %include files written by the program inside a TAGSETS.RTF sandwich. Code snippet from 3gen_codebook_PHARMASUG_2017_QT07.sas: %MACRO printblurb(order); ODS TAGSETS.RTF STYLE=styles.noborder; ODS STARTPAGE=no; PROC REPORT NOWD DATA=print&order STYLE(report)=[cellpadding=3pt vjust=b] STYLE(header)=[just=center font_face=Helvetica font_weight=bold font_size=10pt] STYLE(lines)=[just=left font_face=Helvetica] ; COLUMNS blurb ; DEFINE blurb / style(COLUMN)={just=l font_face=Helvetica font_size=10pt cellwidth=988 } style(HEADER)={just=l font_face=Helvetica font_size=10pt }; RUN; ODS STARTPAGE=no; %MEND; The codebook construction can take some time. Arrange to send yourself a text message with the condition code of your job when it finishes, and get a cup of coffee. Code snippet from 3gen_codebook_PHARMASUG_2017_QT07.sas: FILENAME msg EMAIL TO="0000000000@txt.att.net" FROM = "Big Nerd <louise_hadden@abtassoc.com>" SUBJECT="All Systems Go (or not)?"; DATA _null_; FILE msg; PUT "Program Path and Name: %SYSFUNC(GETOPTION(SYSIN))"; PUT "RUN &SYSDATE - &SYSTIME - by &SYSUSERID in &SYSPROCESSMODE"; PUT "Condition Code is &SYSCC."; RUN; STEP 4 Similarly, metadata can be accessed to create label, format, and length, etc. statements. Code snippet from 4gen_label_fmt_stmnt_PHARMASUG_2017_QT07.sas: DATA temp1; LENGTH include_string $ 180; SET dd.heart_cb; label=COMPRESS(label,'"'); qlabel=CATS('"',label,'"'); include_string=CATX(' ',name,'=',qlabel); RUN; DATA templabel (KEEP=include_string); FILE out1 LRECL=180 PAD; LENGTH include_string $ 180; SET runlabel temp1 runrun; PUT include_string; RUN; DATA temp2; LENGTH include_string $ 180; SET dd.heart_cb (WHERE=(format NE '')); qformat=CATS(format,'.'); include_string=CATX(' ',name,qformat); RUN; DATA tempfmt (KEEP=include_string); FILE out2 LRECL=180 PAD; LENGTH include_string $ 180; SET runformat temp2 runrun; PUT include_string; RUN; The resulting statement, example shown below, can be included in other programs seamlessly. *************************************************************************************************************************************************** The author gratefully acknowledges the helpful work of Kathy Fraeman, Michael Raithel, Patrick Thornton, Troy Martin Hughes, Richann Watson, Roberta Glass and Kirk Paul Lafler, among others.

louisehadden · ‎02-18-2020

Live Streams are in Room 146A! We have a great slate of presentations.

louisehadden · ‎02-04-2020

Here are some Open Source sessions for you! Sample Agenda for Programming: Open Source – Python Centric Date / Time / Length Session Title Session # Location Sunday Mar 29 / 30 / 5:00 PM Continuous Integration and Automation Testing of SAS® Programs Using Jenkins and Python 4965 Concourse - Hall B Eposter Station 16 Monday Mar 30 / 30 / 10:30 AM Leveraging Python from Base SAS® 4686 Street Level - Middle Building 144A-C Monday Mar 30 / 30 / 2:00 PM Machine Learning Data Analysis for Enterprise Resource Planning (ERP) Adoption and Enterprise Performance with SAS® and Python 4869 Concourse - Hall B Eposter Station 5 Monday Mar 30 / 60 / 3:30 PM Serving up LEGO Libations: Leveraging Python to Build a MINDSTORMS EV3 Robot Who Mixes Tasty Drinks 4700 Street Level - Middle Building 146A Tuesday Mar 31 / 30 / 11:30 AM Using Python to Maximize Limited SAS® Viya® Resources 4203 Concourse - Hall B Eposter Station 16 Tuesday Mar 31 / 60 / 12:00 PM A Three-Ring Data Visualization Circus: Creating Charts with SAS®, R, and Python 4733 Street Level - Middle Building 151A Tuesday Mar 31 / 60 / 1:00 PM Hands-On Workshop: Three Steps to Learn Python in SAS® Viya® 5331 Street Level - Middle Building 143A-C Tuesday Mar 31 / 60 / 3:00 PM Using Jupyter to Boost Your Data Science Workflow 4732 Street Level - Middle Building 150A Tuesday Mar 31 / 60 / 4:00 PM Hands-On Workshop: Deep Learning with DLPy 5333 Street Level - Middle Building 143A-C Wednesday Apr 1 / 60 / 10:00 AM Choose Your Own Adventure: Manage Model Development Via a Python Integrated Development Environment 4536 Street Level - Middle Building 147B Wednesday Apr 1 / 30 / 11:00 AM Python and R Made Easy for the SAS® Programmer 4644 Street Level - Middle Building 147B Wednesday Apr 1 / 60 / 11:30 AM The History and Evolution of SASPy, Including an Overview of What It Can Do and How To Use It 4141 Street Level - Middle Building 147B Sample Agenda for Programming: Open Source – Varied Date / Time / Length Session Title Session # Location Sunday Mar 29 / 30 / 5:00 PM Continuous Integration and Automation Testing of SAS® Programs Using Jenkins and Python 4965 Concourse - Hall B Eposter Station 16 Monday Mar 30 / 30 / 10:30 AM RegExing in SAS® for Pattern Matching and Replacement 5172 Street Level - Middle Building 145B Monday Mar 30 / 30 / 11:30 AM Monitoring the SAS®9 Platform Using Zabbix 5064 Concourse - Hall B Eposter Station 2 Monday Mar 30 / 30 / 1:30 PM Fast Deployments and Cost Reductions: SAS® in the Azure Cloud with HDinsight and the Azure Data Lake 4981 Street Level - Middle Building 150B Monday Mar 30 / 30 / 2:00 PM Machine Learning Data Analysis for Enterprise Resource Planning (ERP) Adoption and Enterprise Performance with SAS® and Python 4869 Concourse - Hall B Eposter Station 5 Monday Mar 30 / 30 / 4:00 PM Python and SAS® Quality Knowledge Base for Better Data Quality and Entity Resolution 4157 Street Level - Middle Building 152B Tuesday Mar 31 / 60 / 11:00 AM Sample Size Calculations Using SAS®, R, and nQuery Software 4675 Street Level - Middle Building 147A Tuesday Mar 31 / 60 / 12:00 PM A Three-Ring Data Visualization Circus: Creating Charts with SAS®, R, and Python 4733 Street Level - Middle Building 151A Tuesday Mar 31 / 60 / 1:00 PM Hands-On Workshop: Three Steps to Learn Python in SAS® Viya® 5331 Street Level - Middle Building 143A-C Tuesday Mar 31 / 60 / 2:00 PM Building an Expert's Toolbox: Essential Tools for Generating the Perfect Microsoft Excel Worksheet 4594 Street Level - Middle Building 146C Tuesday Mar 31 / 60 / 3:00 PM MIcrosoft Minecraft, the Newest Integrated Development Environment for SAS® 4435 Street Level - Middle Building 147A Tuesday Mar 31 / 60 / 4:00 PM Hands-On Workshop: Deep Learning with DLPy 5333 Street Level - Middle Building 143A-C Wednesday Apr 1 / 60 / 10:30 AM SAS® Viya® Monitoring Using Open-Source Tools 4214 Street Level - Middle Building 150B Wednesday Apr 1 / 30 / 11:00 AM Git for the SAS® Programmer: Using Source Control to Organize Your Code and Collaborate with Others 4197 Street Level - Middle Building 146A Wednesday Apr 1 / 30 / 12:00 PM SAS® and Amazon Redshift: Overview of Current Capabilities 4468 Street Level - Middle Building 152B Wednesday Apr 1 / 60 / 12:30 PM Scalable Cloud-Based Time Series Analysis and Forecasting Using Open Source Software 4440 Street Level - Middle Building 145A Wednesday Apr 1 / 30 / 1:30 PM Open-Source Model Management with SAS® Model Manager 4402 Street Level - Middle Building 154A-B Sample Agenda for Programming: Open Source – R Centric Date / Time / Length Session Title Session # Location Tuesday Mar 31 / 30 / 11:30 AM Using SAS9API and R to Create Violin Plots, Interactive 3D Plots, and a Shiny App for SAS® Data Sets 4901 Concourse - Hall B Eposter Station 17 Tuesday Mar 31 / 60 / 12:00 PM A Three-Ring Data Visualization Circus: Creating Charts with SAS®, R, and Python 4733 Street Level - Middle Building 151A Tuesday Mar 31 / 30 / 3:30 PM Using SAS® and R Integration to Manage and Create a Multilevel Complex Database 4152 Street Level - Middle Building 140B Wednesday Apr 1 / 30 / 11:30 AM Python and R Made Easy for the SAS® Programmer 4644 Street Level - Middle Building 147B Wednesday Apr 1 / 60 / 12:30 PM Using the R interface in SAS® to Call R Functions and Transfer Data 4170 Street Level - Middle Building 147B

louisehadden · ‎02-04-2020

Here are some sessions for Programming Sessions: CAS / CASL / LASR / Grid Date / Time / Length Session Title Session # Location Monday Mar 30 / 60 / 1:30 PM Best Practices for Converting SAS® Code to Leverage SAS® Cloud Analytic Services 4147 Street Level - Middle Building 147B Monday Mar 30 / 30 / 2:30 PM CASL, a Language Specifically Designed for Interacting with SAS® Viya® 4454 Street Level - Middle Building 147B Monday Mar 30 / 30 / 3:00 PM Next Steps: Important Considerations for Moving Your Data and Formats into CAS 4546 Street Level - Middle Building 147B Monday Mar 30 / 30 / 3:30 PM Common Tasks Done with CASL 4091 Street Level - Middle Building 147B Monday Mar 30 / 30 / 5:00 PM Loading Data to SAS® LASR„¢ Analytic Server Through Code: Simplifying Processes with Base SAS® Code 4081 Street Level - Middle Building 147B Tuesday Mar 31 / 30 / 11:00 AM Automatically Loading CAS Tables from SAS® Data Integration Studio Using SAS® Viya® REST APIs 5048 Street Level - Middle Building 152B Tuesday Mar 31 / 30 / 11:30 AM Open Source Python and R Language on the SAS® Shared Grid (SAS® Grid) 4708 Concourse - Hall B Eposter Station 4 Tuesday Mar 31 / 30 / 1:30 PM SAS® Grid Manager and SAS® Viya®: a Strong Relationship 4577 Street Level - Middle Building 150B

Online Status	Offline
Date Last Visited	‎07-10-2025 04:16 PM