Can someone of you help me to write the following First. program using proc sql? If it's not easy, I'd like to understand what we are doing in this program. First. with retain statement is bit confusing for me.
data FILLED_IN_CHANN; set MISSING_CHANS; retain PT_HELD; by PLAN_TO descending CHANNEL; if missing(CHANNEL) then CHANNEL = '01'; if first.PLAN_TO then PT_HELD = CHANNEL; else CHANNEL = PT_HELD; run;
There is no equivalent of FIRST. in PROC SQL. You should do this in a DATA step.
RETAIN and FIRST. are separate functions, they need to be understood separately. FIRST. allows you to perform actions on the first record of a BY variable. RETAIN retains the values of variable PT_HELD and uses them in the next observation unless there is some other command that assigns a value to PT_HELD.
So there is only one line where a value is assigned to PT_HELD, that line is
if first.PLAN_TO then PT_HELD = CHANNEL;
so when you are on the first record of PLAN_TO, then PT_HELD is assigned the value of CHANNEL. Otherwise, PT_HELD has the same value as on the previous record.
If you look at the resulting data set with your own eyes, you should see what is happening.
An SQL equivalent FOR THIS CASE ONLY would be something like
proc sql; create table FILLED_IN_CHANN as select coalescec(a.channel,b.channel,'01') as channel ,<here goes the SQL required verbosity of listing every other variable from A that you want> from MISSING_CHANS as a left join (select distinct PLAN_TO,channel from MISSING_CHANS where not missing(channel) ) as b on a.plan_to=b.plan_to ; quit;
I put the CoalesceC bit first as that is the assignment of interest out of the data step. Write the select in the order you want the variables.
In general trying to translate code that is very order dependent, i.e. using First. and Last. and/or LAG, DIF, Retain functions does not have an easy analog in SQL because SQL seldom processes records in the order you expect and some of the functions like Lag, Dif and Retain just plain do not work in SQL.
Warning: The above code assumes that there is actually only one non-missing value for the Channel for each Plan_to. Without example data it is impossible to suggest a reliable process for the order of values of Channel as it is apparently character and the typical aggregate functions used to get the "largest" are not reliable for character values.
PROC SQL does not care about record order, but the DATA step does. That is why there can be no equivalent to FIRST. in proc sql.
There's one simple answer: don't.
Sequence is the domain of the data step, while SQL does not guarantee any processing order at all. Therefore you need to create complicated SQL queries to force this, and will have, apart from ugly unmaintainable code, considerably worse performance than what you get from a data step.
Do not even think about it, especially not in the beginning of your SAS career.
Maxim 14 says to Use the Right Tool, and this is the data step here.
@David_Billa wrote:
Got it. How the results will differ if I remove retain statement from the
code which I posted?
You could go ahead and try it, and then you'd know the answer.
See Maxim 4. Just try it.
Since your code is designed to carry over the greatest value to all other observations within a group, and set "01" if all values are missing, the RETAIN is essential. Without it, there won't be any carry-over.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.