Matching on an inequality relationship

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 129
Accepted Solution

Matching on an inequality relationship

I want to merge two files based on an inequality relation. See the example input and desired output below. In short, I want to bring information from the lookup table when the smallest value in that table is greater than the cumulative fraction in the base table.

 

data lookup;
input code $ cfraction;
cards;
a 0.3
b 0.5
c 1.0
;
data base;
input id cfraction;
cards;
1 0.1
2 0.2
3 0.3
4 0.4
5 0.6
6 0.9
;

* Desired output: Merge records from lookup onto base, where base.cfraction<=(the first)lookup.cfraction;
1 0.1 a
2 0.2 a
3 0.3 a
4 0.4 b
5 0.6 c
6 0.9 c

Accepted Solutions
Solution
‎07-20-2016 03:57 AM
Super User
Posts: 9,867

Re: Matching on an inequality relationship

It is easy for SQL.

 

 

data lookup;
input code $ cfraction;
cards;
a 0.3
b 0.5
c 1.0
;
data base;
input id cfraction;
cards;
1 0.1
2 0.2
3 0.3
4 0.4
5 0.6
6 0.9
;
proc sql;
select a.*,b.code
 from base as a, lookup as b
  where a.cfraction le b.cfraction
   group by a.id 
    having b.cfraction-a.cfraction=min(b.cfraction-a.cfraction);
quit;

View solution in original post


All Replies
Solution
‎07-20-2016 03:57 AM
Super User
Posts: 9,867

Re: Matching on an inequality relationship

It is easy for SQL.

 

 

data lookup;
input code $ cfraction;
cards;
a 0.3
b 0.5
c 1.0
;
data base;
input id cfraction;
cards;
1 0.1
2 0.2
3 0.3
4 0.4
5 0.6
6 0.9
;
proc sql;
select a.*,b.code
 from base as a, lookup as b
  where a.cfraction le b.cfraction
   group by a.id 
    having b.cfraction-a.cfraction=min(b.cfraction-a.cfraction);
quit;
Super User
Posts: 5,361

Re: Matching on an inequality relationship

Another approach to consider:  changing the lookup data set into a format.  Hard-coded, it would look like:

 

proc format;

value code 0-0.3="a"  0.3-0.5="b"  0.5-0.6="c" other=" ";

run;

 

Then using the format:

 

data want;

set base;

code = put(cfraction, code.);

run;

 

The format can be permanently saved, and there are ways to automate the creation of the format from a SAS data set so you don't need to type out the complete VALUE statement (particularly useful if you have hundreds of records in the lookup data set rather than just a few).

Frequent Contributor
Posts: 129

Re: Matching on an inequality relationship

Thanks for both suggestions. I'll use the SQL approach, since it looks like the simplest when I have many records.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 391 views
  • 2 likes
  • 3 in conversation