BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
David_Billa
Rhodochrosite | Level 12

I'm trying to understand the program which was written by other programmer. In the below code, I don't understand the purpose of IN operator.
Any help to make me understand this code?

 

data want;
set have1(in=a) have2(in=b);
by id name;
if dt=. then dt=start_dt;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

See this simple example:

data a;
input val;
datalines;
1
3
5
;

data b;
input val;
datalines;
2
4
6
;

data interleave;
set
  a
  b
;
by val;
run;

You will find that the resulting dataset is neatly ordered in sequence 1 - 2 - 3 - 4 - 5 - 6.

Play around with the values in the datasets to see how the interleave works

  • multiple entries in one or both datasets
  • identical entries in both datasets
  • see what you can do with IN= dataset options
  • explore the behavior of FIRST. and LAST.

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

This is a particularly poor example, as the IN operator is called and creates variables, but the variables created are not used. So they have no purpose here.

 

In any event, the SAS documentation explains it all:

https://documentation.sas.com/doc/en/pgmmvacdc/9.4/ledsoptsref/n1p1o2dsuc465nn198ovwdrj9mvy.htm

--
Paige Miller
Tom
Super User Tom
Super User

The IN= dataset option (the IN operator is a totally different concept and does not appear in this program) is doing no good in that program.  And it might be doing harm if either HAVE1 or HAVE2 have variables names A or B already. The purpose of the IN= dataset option is to name a numeric variable that SAS will set TRUE when that dataset contributed to the current observation.  But this data step is not using those variables for anything.

 

The data step is interleaving the observations from HAVE1 and HAVE2 so that the resulting WANT dataset will still be sorted by ID and NAME.

The IF/THEN statement is replacing missing DT values with value of DT_START.  You could have instead used the COALESE() function.

data want;
  set have1 have2;
  by id name;
  dt=coalesce(dt,start_dt);
run;

 

David_Billa
Rhodochrosite | Level 12

@Tom So in my example, it's just appending the HAVE1 and HAVE2 dataset and then applying IF clause? I just trying to understand the code which I posted.

Tom
Super User Tom
Super User

Not APPENDING. It is INTERLEAVING the observations from both datasets.

 

If you remove the BY statement then it would be APPENDING.

David_Billa
Rhodochrosite | Level 12
Interleaving is similar to inner join?
Tom
Super User Tom
Super User

@David_Billa wrote:
Interleaving is similar to inner join?

No.  An inner join is more like a MERGE.

 

Have you ever tried to sort something that was too big to sort all at once? So you break it into smaller piles and sort each one.  Now you can make your final output by just taking from the top of each smaller pile by checking which pile has the smallest value on the top.  That is interleaving.  

 

Or kind of like shuffling a deck of cards.  Only with some smarts so the result is still sorted.

 

Do your own experiment. Take a decks of cards.  Take some HEARTS and some SPADES and make two piles based on suit.  Sort each pile so they are face up with the smallest on the top.  Now make a new stack face down by taking the smallest card from the top of the two stacks. When they are the same value take the HEART one first.  The result is an ordered set of cards.  When there is a tie in value the HEART is always before the SPADE.

David_Billa
Rhodochrosite | Level 12

@Tom Were you able to point me to one SAS program example which use SET with BY for interleaving? I use first. and last. whenever I use set with by. Posted example was written by other progarmmer

mkeintz
PROC Star

@David_Billa wrote:

@Tom Were you able to point me to one SAS program example which use SET with BY for interleaving? I use first. and last. whenever I use set with by. Posted example was written by other progarmmer


data boys (where=(sex='M'))
     girls (where=(sex='F'));
  set sashelp.class;
run;

data interleaved;
  set boys girls;
  by name;
run;

The above works because sashelp.class  (and therefore BOYS and GIRLS) is sorted by NAME.  As a consequence dataset INTERLEAVED is sorted by name.

 

 

Without the BY statement, you would get:

data all_boys_followed_by_all_girls;
  set boys girls;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

@David_Billa 

What interleaving is and how it works is explained in the SAS documentation here.

This is a sub-chapter of Combining Data. May-be worth that you read this chapter from start to end.

Kurt_Bremser
Super User

See this simple example:

data a;
input val;
datalines;
1
3
5
;

data b;
input val;
datalines;
2
4
6
;

data interleave;
set
  a
  b
;
by val;
run;

You will find that the resulting dataset is neatly ordered in sequence 1 - 2 - 3 - 4 - 5 - 6.

Play around with the values in the datasets to see how the interleave works

  • multiple entries in one or both datasets
  • identical entries in both datasets
  • see what you can do with IN= dataset options
  • explore the behavior of FIRST. and LAST.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2172 views
  • 8 likes
  • 6 in conversation