Solved: Convert a Ranged Lookup to Standard

A_SAS_Man · Posted 05-19-2022 10:58 AM

I have a lookup being provided to me in the following format.

data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;

I have data structured like this.

data test1;
input code $5.;
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;

I had previously been trying to join data in the following way, which creates a cartesian join and I'm unable to execute on our large dataset.

proc sql;
create table want as
     select a.*,
              b.data
     from test1 a left join test2 b on
           (a.code <= b.code2)
           And
           (a.code >= b.code1)
;
quit;

To remedy this, I was trying to figure out a way to "flatten" this lookup so that I would have the following.

 data want2;
input code1 $6. data;
datalines;
23145 5
23146 5
23147 5
7892F 6
7893F 6
32012 7
4456G 8
80090 9
80091 9
;

Essentially trying to spell out all the values that would be in the ranges and create a unique line for each. How could I create want2? To my current knowledge, none of the lookup values that contain letters would have a range, i.e., the code1 and code2 values would match. But if there's a way to create a robust process that could account for future changes from this I would be very interested.

I'm also open to thoughts on how to optimize the join (create want directly), lookup or anyway to complete.

Reeza · Posted 05-19-2022 11:34 AM

Consider using a format instead.

Performance will be much faster but not sorted. If sort order is important, sort it after the fact.

data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;

data code_fmt;
set test2;
fmtname = 'code_fmt';
type='C';
rename code1=start code2=end data=label;
run;

proc format cntlin=code_fmt;
run;


data test1;
input code $5.;
code_formatted = put(code, $code_fmt.);
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;
run;

proc print data=test1;
run;

@A_SAS_Man wrote:

I have a lookup being provided to me in the following format.
data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;
I have data structured like this.
data test1;
input code $5.;
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;
I had previously been trying to join data in the following way, which creates a cartesian join and I'm unable to execute on our large dataset.
proc sql;
create table want as
     select a.*,
              b.data
     from test1 a left join test2 b on
           (a.code <= b.code2)
           And
           (a.code >= b.code1)
;
quit;
To remedy this, I was trying to figure out a way to "flatten" this lookup so that I would have the following.
 data want2;
input code1 $6. data;
datalines;
23145 5
23146 5
23147 5
7892F 6
7893F 6
32012 7
4456G 8
80090 9
80091 9
;
Essentially trying to spell out all the values that would be in the ranges and create a unique line for each. How could I create want2? To my current knowledge, none of the lookup values that contain letters would have a range, i.e., the code1 and code2 values would match. But if there's a way to create a robust process that could account for future changes from this I would be very interested.

I'm also open to thoughts on how to optimize the join (create want directly), lookup or anyway to complete.

View solution in original post

Reeza · Posted 05-19-2022 11:34 AM

Consider using a format instead.

Performance will be much faster but not sorted. If sort order is important, sort it after the fact.

data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;

data code_fmt;
set test2;
fmtname = 'code_fmt';
type='C';
rename code1=start code2=end data=label;
run;

proc format cntlin=code_fmt;
run;


data test1;
input code $5.;
code_formatted = put(code, $code_fmt.);
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;
run;

proc print data=test1;
run;

@A_SAS_Man wrote:

I have a lookup being provided to me in the following format.
data test2;
input code1 $6. code2 $6. data;
datalines;
23145 23147 5
7892F 7892F 6
7893F 7893F 6
32012 32012 7
4456G 4456G 8
80090 80091 9
;
I have data structured like this.
data test1;
input code $5.;
datalines;
80090
80091
7892F
7893F
7893F
32012
4456G
23145
23146
23147
;
I had previously been trying to join data in the following way, which creates a cartesian join and I'm unable to execute on our large dataset.
proc sql;
create table want as
     select a.*,
              b.data
     from test1 a left join test2 b on
           (a.code <= b.code2)
           And
           (a.code >= b.code1)
;
quit;
To remedy this, I was trying to figure out a way to "flatten" this lookup so that I would have the following.
 data want2;
input code1 $6. data;
datalines;
23145 5
23146 5
23147 5
7892F 6
7893F 6
32012 7
4456G 8
80090 9
80091 9
;
Essentially trying to spell out all the values that would be in the ranges and create a unique line for each. How could I create want2? To my current knowledge, none of the lookup values that contain letters would have a range, i.e., the code1 and code2 values would match. But if there's a way to create a robust process that could account for future changes from this I would be very interested.

I'm also open to thoughts on how to optimize the join (create want directly), lookup or anyway to complete.

A_SAS_Man · Posted 05-19-2022 12:32 PM

Thanks this appears to be working, one question. If I had two data fields I was trying to attach as part of this would I essentially just have to repeat this process for each data element? Or is there a way to use format to attach two values? See example alternate test2 below, if I wanted to attach data2 a the same time and with the same logic as data is there a simple way to do that in your method? Essentially data2 in my process is just a description that I need on there.

data test2;
input code1 $6. code2 $6. data data2;
datalines;
23145 23147 5 k
7892F 7892F 6 l
7893F 7893F 6 l
32012 32012 7 y
4456G 4456G 8 f
80090 80091 9 s
;

Reeza · Posted 05-19-2022 01:19 PM

You cannot add two items to a format, so you'd need to create multiple formats.
If you're doing two or three of these it's still efficient to use a format.
If you need to bring over more items then I'd use a hash lookup instead but I don't code hash tables.

FreelanceReinh · Posted 05-19-2022 03:59 PM

You could also use a concatenation of data and data2 (with a suitable delimiter) as the format label and then decompose the formatted value using the SCAN function.

If data2 depends on data, not on the codes, and the same (data, data2) pair applies to multiple ranges of codes, a separate look-up "data → data2" would avoid the need to store multiple copies of (possibly long) data2 values in the format.

Convert a Ranged Lookup to Standard

Re: Convert a Ranged Lookup to Standard

Re: Convert a Ranged Lookup to Standard

Re: Convert a Ranged Lookup to Standard

Re: Convert a Ranged Lookup to Standard

Re: Convert a Ranged Lookup to Standard

Convert a Ranged Lookup to Standard

Re: Convert a Ranged Lookup to Standard

Re: Convert a Ranged Lookup to Standard

Re: Convert a Ranged Lookup to Standard

Re: Convert a Ranged Lookup to Standard

Re: Convert a Ranged Lookup to Standard

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!