I'm trying to understand the program which was written by other programmer. In the below code, I don't understand the purpose of IN operator.
Any help to make me understand this code?
data want; set have1(in=a) have2(in=b); by id name; if dt=. then dt=start_dt; run;
See this simple example:
data a;
input val;
datalines;
1
3
5
;
data b;
input val;
datalines;
2
4
6
;
data interleave;
set
a
b
;
by val;
run;
You will find that the resulting dataset is neatly ordered in sequence 1 - 2 - 3 - 4 - 5 - 6.
Play around with the values in the datasets to see how the interleave works
This is a particularly poor example, as the IN operator is called and creates variables, but the variables created are not used. So they have no purpose here.
In any event, the SAS documentation explains it all:
https://documentation.sas.com/doc/en/pgmmvacdc/9.4/ledsoptsref/n1p1o2dsuc465nn198ovwdrj9mvy.htm
The IN= dataset option (the IN operator is a totally different concept and does not appear in this program) is doing no good in that program. And it might be doing harm if either HAVE1 or HAVE2 have variables names A or B already. The purpose of the IN= dataset option is to name a numeric variable that SAS will set TRUE when that dataset contributed to the current observation. But this data step is not using those variables for anything.
The data step is interleaving the observations from HAVE1 and HAVE2 so that the resulting WANT dataset will still be sorted by ID and NAME.
The IF/THEN statement is replacing missing DT values with value of DT_START. You could have instead used the COALESE() function.
data want;
set have1 have2;
by id name;
dt=coalesce(dt,start_dt);
run;
@Tom So in my example, it's just appending the HAVE1 and HAVE2 dataset and then applying IF clause? I just trying to understand the code which I posted.
Not APPENDING. It is INTERLEAVING the observations from both datasets.
If you remove the BY statement then it would be APPENDING.
@David_Billa wrote:
Interleaving is similar to inner join?
No. An inner join is more like a MERGE.
Have you ever tried to sort something that was too big to sort all at once? So you break it into smaller piles and sort each one. Now you can make your final output by just taking from the top of each smaller pile by checking which pile has the smallest value on the top. That is interleaving.
Or kind of like shuffling a deck of cards. Only with some smarts so the result is still sorted.
Do your own experiment. Take a decks of cards. Take some HEARTS and some SPADES and make two piles based on suit. Sort each pile so they are face up with the smallest on the top. Now make a new stack face down by taking the smallest card from the top of the two stacks. When they are the same value take the HEART one first. The result is an ordered set of cards. When there is a tie in value the HEART is always before the SPADE.
@Tom Were you able to point me to one SAS program example which use SET with BY for interleaving? I use first. and last. whenever I use set with by. Posted example was written by other progarmmer
@David_Billa wrote:
@Tom Were you able to point me to one SAS program example which use SET with BY for interleaving? I use first. and last. whenever I use set with by. Posted example was written by other progarmmer
data boys (where=(sex='M'))
girls (where=(sex='F'));
set sashelp.class;
run;
data interleaved;
set boys girls;
by name;
run;
The above works because sashelp.class (and therefore BOYS and GIRLS) is sorted by NAME. As a consequence dataset INTERLEAVED is sorted by name.
Without the BY statement, you would get:
data all_boys_followed_by_all_girls;
set boys girls;
run;
What interleaving is and how it works is explained in the SAS documentation here.
This is a sub-chapter of Combining Data. May-be worth that you read this chapter from start to end.
See this simple example:
data a;
input val;
datalines;
1
3
5
;
data b;
input val;
datalines;
2
4
6
;
data interleave;
set
a
b
;
by val;
run;
You will find that the resulting dataset is neatly ordered in sequence 1 - 2 - 3 - 4 - 5 - 6.
Play around with the values in the datasets to see how the interleave works
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.