Hello all. I am a novice SAS user, mostly base, though have used and slightly manipulated programs using proc sql, macros etc.
I recently faced some tough (from my perspective) questions in an interview for a position that uses SAS. I am asking from memory, but was hoping someone could take the time to explain in slight detail some of the answers (or HOW i should have answered here).
1) Look at this sample code
proc sql;
create table C as
select A.VA, B.VB,
from A left join B
on strip(A.VA) = strip(B.VA)
order by VA;
Quit;
Please describe what this does and the exact detail of the output dataset C (Variables and observations)?
2) How do you use the do loop if you don’t know how many times you should execute the do loop?
3) How many ways to create a format using Proc Format?
4) Can you explain the differences between a macro program and a macro variable?
5) How many ways are there to create a macro variable?
I feel like I must have done very poorly on these, and rather than spend more time researching in looking it up I am asking for some help as a relatively novice user. Any help appreciated. Thank you.
Cristoban
Well for the first one, the proc sql code, which is the one I am most concerned with, I had said it created a new dataset taking variables A and B from datasets (or tables as referred to sql) and in the new dataset C, its left join which takes everything from A, even if there is nothing from B.
Where my confusion came is the on strip clause. I mentioned strip removing leading/trailing zeros blanks, so by guess was that it would produce a dataset where there was matches anywhere in dataset A and B regardless of blanks or zeros. But I am, not sure this is correct
Since SAS stores character variables as fixed length it already ignores trailing blanks in comparisons. So 'AB ' is equal to 'AB'.
Adding the STRIP() function call means that you want the test to also ignore any leading blanks. So ' AB ' is equal to 'AB'.
@cristoban wrote:
Well for the first one, the proc sql code, which is the one I am most concerned with, I had said it created a new dataset taking variables A and B from datasets
It's taking variables VA and VB, the A and B are table alias to indicate which is the source table, so that is incorrect.
@cristoban wrote:
its left join which takes everything from A, even if there is nothing from B.
This part is mostly correct, any records with an ID in A is kept, so your output data set size should be >= size of data set A, assuming no WHERE statements.
Where my confusion came is the on strip clause. I mentioned strip removing leading/trailing zeros blanks, so by guess was that it would produce a dataset where there was matches anywhere in dataset A and B regardless of blanks or zeros. But I am, not sure this is correct
STRIP() only affects leading spaces, not zeroes, so that portion is incorrect.
Most of the answers to these questions would be relatively useless to you. The next interview you go on could easily ask very different questions, on very different topics. If you need to learn SAS, learn SAS. That would take an exceedingly long time if you do it by asking questions based on interview questions.
The one question from the interview that would be useful would be the first one. Knowing the basics of PROC SQL would be useful.
Here are a few more topics you can research and study.
What happens when you add a BY statement to a DATA step?
When combining SAS data sets, how do you find matches and mismatches?
How do you get an output data set from PROC MEANS? What is the effect of a CLASS statement?
How does SAS test whether A > B, when A and B are character variables?
There are many more topics I could add to the list. But this is as good a starting point as any.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.