Watch this Ask the Expert session to learn various join algorithms and the comparisons between them.
You will learn about:
The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.
Q&A
Can you add end damage to the keep statement to have it display in the output?
The question was about the in equals option. Because end damage is a temporary variable, it's going to get dropped. The only way I can keep it is by creating a new end damage, physically creating this variable. It's not going to get written out, so you can't really keep it. Many of us may want to keep this because why not? To do this, you will need to build a new variable and assign it a value of this temporary variable. So that would be my suggestion if you want to store the value of new damage for posterity, for subsequent use.
Data step has keep statement for indamage, yet the output doesn’t KEEP indamage because it’s a temporary variable only available during the PDV for processing.
This data step explicitly builds a new variable Newwind, assigning it the value of indamage, and now it’s evident that newwind is KEPT in the output table.
Is there a "trick" to join to tables on columns with different datatypes without an additional data step that does the conversion? Is there a way to do a conversion "on the fly" e.g., by dataset options?
Not really, since they must be the same type (char to char, etc.). However, you may use a PUT function to convert numeric to character since the PUT function always returns a char variable; and you may use an INPUT function to convert character to numeric.
I frequently get warnings when merging on character variables /IDs that the length of the by variable does not match across files. Short of a look at the dictionary tables for lengths, is there a way to address that issue on the fly?
You may always right click on a variable in the library listing, selecting 'properties', which is a nice on-the-fly way of viewing any variable attribute. You can also take advantage of the sashelp.vtable views, which lists all attributes.
When doing the one-to-many data step merge, was Alfred's 1st record (Math record) read in replaced by his 2nd record (Reading score) or did they both output?
When doing a one-to-many merge, all the "many" will be outputs. If you have one student, merged with multiple classes, you'll want to output every class that student has taken.
If you merge without BY statement, say one dataset named A has 6 observations and the other datasets named B have 12 records. What happens to the resulting dataset, C? Will the values for the variables in the 6th observations be replicated in 7th to 12th observations in dataset C?
I don't believe so. A merge without a "by" statement is called a "blind merge" meaning observation 1 in table 1 will 'merge' with observation 1 in table 2, regardless of value. For the data step to terminate, the end-of-file marker must be reached by BOTH tables. When eof is encountered by only a single table, the data step continues to only execute output the data from table 2. See this example: data one; infile datalines; input id name $; datalines; 1 joe 2 jack ; data two; infile datalines; input id salary; datalines; 1 40000 2 50000 3 70000 ; data three; merge one two; run;
Is there any book that you recommend about hashes to learn more about it?
I can't think of any single book that covers everything on hashes, but if you Google "SAS Hash Objects" you'll be directed to a large number of White Papers, videos, and other resources that cover the topic. Jason Secowsky has an excellent paper on the topic. And Jennifer warner’s paper is worth looking at.
Can this be executed using Query Builder as well?
Yes, you can join tables using the query builder in SAS enterprise Guide. Under the covers, a PROC SQL Join is executed. If you are looking for a data step merge, then use the code shared in this ask the expert as a starting point.
What if there is more than 1 match?
SAS processes data sequentially after the tables have been sorted. If there is more than one match, sas will process the row that is sequentially first, load into the PDV, perform any processing and then write that observation out to the output dataset. Then as the record pointer moves one down, it will process the next match and then write it out.
Recommended Resources
Quick Record Lookup without an Index, Choate, Paul A.
A database professional’s best friend, Shankar, Charu
Top 10 SAS best programming practices, Shankar, Charu
Moving from SAS®9 to SAS® Viya®
Please see additional resources in the attached slide deck.
Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Ready to level-up your skills? Choose your own adventure.
Your Home for Learning SAS
SAS Academic Software
SAS Learning Report Newsletter
SAS Tech Report Newsletter