BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
A_SAS_Man
Pyrite | Level 9

I have a lookup being provided to me in the following format.

 

 

data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;

I have data structured like this.

 

data test1;
input code $5.;
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;

I had previously been trying to join data in the following way, which creates a cartesian join and I'm unable to execute on our large dataset.

 

 

proc sql;
create table want as
     select a.*,
              b.data
     from test1 a left join test2 b on
           (a.code <= b.code2)
           And
           (a.code >= b.code1)
;
quit;

To remedy this, I was trying to figure out a way to "flatten" this lookup so that I would have the following.

 

 data want2;
input code1 $6. data;
datalines;
23145 5
23146 5
23147 5
7892F 6
7893F 6
32012 7
4456G 8
80090 9
80091 9
;

Essentially trying to spell out all the values that would be in the ranges and create a unique line for each. How could I create want2? To my current knowledge, none of the lookup values that contain letters would have a range, i.e., the code1 and code2 values would match. But if there's a way to create a robust process that could account for future changes from this I would be very interested.

 

I'm also open to thoughts on how to optimize the join (create want directly), lookup or anyway to complete.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Consider using a format instead.

Performance will be much faster but not sorted. If sort order is important, sort it after the fact.

 

data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;

data code_fmt;
set test2;
fmtname = 'code_fmt';
type='C';
rename code1=start code2=end data=label;
run;

proc format cntlin=code_fmt;
run;


data test1;
input code $5.;
code_formatted = put(code, $code_fmt.);
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;
run;

proc print data=test1;
run;

@A_SAS_Man wrote:

I have a lookup being provided to me in the following format.

 

 

data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;

I have data structured like this.

 

data test1;
input code $5.;
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;

I had previously been trying to join data in the following way, which creates a cartesian join and I'm unable to execute on our large dataset.

 

 

proc sql;
create table want as
     select a.*,
              b.data
     from test1 a left join test2 b on
           (a.code <= b.code2)
           And
           (a.code >= b.code1)
;
quit;

To remedy this, I was trying to figure out a way to "flatten" this lookup so that I would have the following.

 

 data want2;
input code1 $6. data;
datalines;
23145 5
23146 5
23147 5
7892F 6
7893F 6
32012 7
4456G 8
80090 9
80091 9
;

Essentially trying to spell out all the values that would be in the ranges and create a unique line for each. How could I create want2? To my current knowledge, none of the lookup values that contain letters would have a range, i.e., the code1 and code2 values would match. But if there's a way to create a robust process that could account for future changes from this I would be very interested.

 

I'm also open to thoughts on how to optimize the join (create want directly), lookup or anyway to complete.

 


 

View solution in original post

4 REPLIES 4
Reeza
Super User

Consider using a format instead.

Performance will be much faster but not sorted. If sort order is important, sort it after the fact.

 

data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;

data code_fmt;
set test2;
fmtname = 'code_fmt';
type='C';
rename code1=start code2=end data=label;
run;

proc format cntlin=code_fmt;
run;


data test1;
input code $5.;
code_formatted = put(code, $code_fmt.);
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;
run;

proc print data=test1;
run;

@A_SAS_Man wrote:

I have a lookup being provided to me in the following format.

 

 

data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;

I have data structured like this.

 

data test1;
input code $5.;
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;

I had previously been trying to join data in the following way, which creates a cartesian join and I'm unable to execute on our large dataset.

 

 

proc sql;
create table want as
     select a.*,
              b.data
     from test1 a left join test2 b on
           (a.code <= b.code2)
           And
           (a.code >= b.code1)
;
quit;

To remedy this, I was trying to figure out a way to "flatten" this lookup so that I would have the following.

 

 data want2;
input code1 $6. data;
datalines;
23145 5
23146 5
23147 5
7892F 6
7893F 6
32012 7
4456G 8
80090 9
80091 9
;

Essentially trying to spell out all the values that would be in the ranges and create a unique line for each. How could I create want2? To my current knowledge, none of the lookup values that contain letters would have a range, i.e., the code1 and code2 values would match. But if there's a way to create a robust process that could account for future changes from this I would be very interested.

 

I'm also open to thoughts on how to optimize the join (create want directly), lookup or anyway to complete.

 


 

A_SAS_Man
Pyrite | Level 9

Thanks this appears to be working, one question. If I had two data fields I was trying to attach as part of this would I essentially just have to repeat this process for each data element? Or is there a way to use format to attach two values? See example alternate test2 below, if I wanted to attach data2 a the same time and with the same logic as data is there a simple way to do that in your method? Essentially data2 in my process is just a description that I need on there.

 

data test2;
input code1 $6. code2 $6. data data2;
datalines;
23145 23147 5 k
7892F 7892F 6 l
7893F 7893F 6 l
32012 32012 7 y
4456G 4456G 8 f
80090 80091 9 s
;
Reeza
Super User
You cannot add two items to a format, so you'd need to create multiple formats.
If you're doing two or three of these it's still efficient to use a format.
If you need to bring over more items then I'd use a hash lookup instead but I don't code hash tables.
FreelanceReinh
Jade | Level 19

You could also use a concatenation of data and data2 (with a suitable delimiter) as the format label and then decompose the formatted value using the SCAN function.

 

If data2 depends on data, not on the codes, and the same (data, data2) pair applies to multiple ranges of codes, a separate look-up "datadata2" would avoid the need to store multiple copies of (possibly long) data2 values in the format.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 565 views
  • 3 likes
  • 3 in conversation