BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
genemroz
Quartz | Level 8

Esteemed Advisers,

 

I need a solution to the following exemplar problem of subsetting one dataset (Have) contingent on start/stop values from a second dataset (Interval) to create dataset Want. Thanks in advance for any insights you can offer.

Data Intervals;
Input Interval $ Start $ Stop $;
1	B	-
1	-	D
2	F	-
2	-	J
3	L	-
3	-	N
;

Data Have;
Input ID $;
A
B
C
D
E
F
G
H
J
K
L
M
N
;

Data Want;
Input Interval $ ID $;
1	B
1	C
1	D
2	G
2	H
2	J
2	K
3	L
3	M
3	N
;
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Before going to code some questions are in order.

Is this supposed to be a surrogate for more complex data, such as words or phrases phrases for the start/stop values of your "real" project? This may be very critical as "interval" defined on character values may be very hard to work with in general as the < or > operators do not always provide the result you expect when values are of different lengths. Not to mention the complications that would arise for mixed case of letters.

 

How many observations will exist in your real Have and Intervals data sets? One obvious way to do this could create an extreme number of intermediate values leading to long run times.

 

Why does the Interval data have start and stop on separate observations? The natural way to demonstrate and interval is with both values on a single observation (row).

Are the - in the Interval set supposed to indicate missing or blank values or does the dash have some meaning? In a data step reading values placing a period in the position to indicate a missing value will do so but a dash is a value.

 

Why is 2 F not in the Want set? The start of Interval 2 is F.

 

Correcting the Interval and Have data steps so they will run and create data (missing the datalines statement) and making the Interval set an actual interval, this works for your example data if 2 F is supposed to be in the output.

 

Data Intervals;
Input Interval $ Start $ Stop $;
datalines;
1	B	D
2	F	J
3	L	N
;

Data Have;
Input ID $;
datalines;
A
B
C
D
E
F
G
H
J
K
L
M
N
;

proc sql;
   create table want as 
   select a.interval, b.Id
   from intervals as a,have as b
   where a.start le b.id le a.stop
   ;
quit;

View solution in original post

3 REPLIES 3
ballardw
Super User

Before going to code some questions are in order.

Is this supposed to be a surrogate for more complex data, such as words or phrases phrases for the start/stop values of your "real" project? This may be very critical as "interval" defined on character values may be very hard to work with in general as the < or > operators do not always provide the result you expect when values are of different lengths. Not to mention the complications that would arise for mixed case of letters.

 

How many observations will exist in your real Have and Intervals data sets? One obvious way to do this could create an extreme number of intermediate values leading to long run times.

 

Why does the Interval data have start and stop on separate observations? The natural way to demonstrate and interval is with both values on a single observation (row).

Are the - in the Interval set supposed to indicate missing or blank values or does the dash have some meaning? In a data step reading values placing a period in the position to indicate a missing value will do so but a dash is a value.

 

Why is 2 F not in the Want set? The start of Interval 2 is F.

 

Correcting the Interval and Have data steps so they will run and create data (missing the datalines statement) and making the Interval set an actual interval, this works for your example data if 2 F is supposed to be in the output.

 

Data Intervals;
Input Interval $ Start $ Stop $;
datalines;
1	B	D
2	F	J
3	L	N
;

Data Have;
Input ID $;
datalines;
A
B
C
D
E
F
G
H
J
K
L
M
N
;

proc sql;
   create table want as 
   select a.interval, b.Id
   from intervals as a,have as b
   where a.start le b.id le a.stop
   ;
quit;

genemroz
Quartz | Level 8

Thanks for the prompt response.  It was very helpful.  The start/stop times came to me in a dataset as separate observations. Once I figured out how to get them into a single observation your solution works perfectly.  And I apologize for my typo that left 2 F out of the Want dataset.

Ksharp
Super User

Just for having some fun.

Data Intervals;
Input Interval $ Start $ Stop $;
datalines;
1	B	D
2	F	J
3	L	N
;

data want;
 set Intervals;
 do i=rank(Start) to rank(Stop);
  want=byte(i);output;
 end;
 drop i;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 651 views
  • 0 likes
  • 3 in conversation