Desktop productivity for business analysts and programmers

Closest match merge

Reply
N/A
Posts: 0

Closest match merge

Hi,
I have a dataset that contains zip codes and densities. Densities is a continuous variable. As an example, I want to split the densities into 6 groups:

lt 150
150 - 500
500 - 1000
1000 - 3000
3000 - 9000
gt 9000

I know I could create a computed column and use a case when statement. However, assume I wanted to split the densities into 100 pre-defined groups. The case when statement in this instance would be daunting.

If this were being done in excel, I would use a vlookup table and specify True for the range lookup, and it would return the closest value without going over. How can I do something similar in SAS EG? I tried using a table join, joining on density, and instead of using = I used <= or >= (tried both) but it didn't return the results that I was expecting.

Any suggestions?
N/A
Posts: 0

Re: Closest match merge

Edit: sentence should say "instead of using = I used ge and le"
Community Manager
Posts: 2,889

Re: Closest match merge

How about using a format?

And you could create a format from a data source that had all your ranges predefined, using this task:

http://blogs.sas.com/sasdummy/index.php?/archives/95-Creating-a-SAS-format-from-a-data-set.html

Chris
Super User
Posts: 5,386

Re: Closest match merge

Since I'm not so fond of using formats in DW processing, I would prefer a join approach. If haven't misunderstood, I think that a join criteria using BETWEEN from AND to would work, where from and to are the boundaries for your density.

/Linus
Data never sleeps
Respected Advisor
Posts: 4,135

Re: Closest match merge

In case you're using SAS datasets: What's so bad about using formats? They are in a way nothing else than lookup tables with 2 columns - and they are very easy to use.

One of the best things about formats: You wouldn't have to split your values and assign them to another variable - just assign the format to the original variable if needed.
With a lot of SAS Procedures you can choose for classification variables whether the formatted or unformatted value should be taken for calculations - and you can easily assign another format if needed (i.e. one with only 50 groups).
SAS Employee
Posts: 149

Re: Closest match merge

Any of the techniques mentioned should work, including GE and LE in the join criteria as you originally described. BETWEEN AND, as suggested, might simply what you're trying to do.

Be aware that if you have a very large data source, the join solution could potentially take a LONG time to execute. When you join on something other than equality, in many cases PROC SQL will be forced to create a full catesian product behind the scenes. There will be a note in the log if that is the case. If you had 100 pre-defined groups joining to 100,000 rows, the cartesian product would be 100*100,000=10,000,000 rows behind the scenes before the final results were isolated and returned!
N/A
Posts: 0

Re: Closest match merge

Chris,
my versions of EG is 4.1 and that program requires 4.2, so I can't use that.

Linus,
I'm not sure how to use between as part of a join. Or would I have to make a computed column? If so, that's what I'm trying to avoid because I don't want to have to write case when a then b 100 times. When I used the GE and LE joins, it returned to many rows. since i.e. the value 1500 would fall into the category lt 3000, lt 9000 and lt max value.

I'm not sure how to use formats as a lookup.
Super User
Posts: 5,386

Re: Closest match merge

Richard, you are right about the Cartesian product join. Funny that proc sql can't optimize this on, it's not rocket science...?
Another sql alternative is to have an inline view in the select statement. The downside here is that I don't think you can't point and click that one in EG.
You have to write it yourself, and then link the sub queried table manually if you want the dependency to be visible in the Project designer.

You have to create a calculated column using the Advanced expression builder. If you have built an interval table for you densities, your sub-query could look like this:

select densGroupDesc from denity_ranges where origtable.density between fromDens and toDens

/Linus
Data never sleeps
SAS Employee
Posts: 149

Re: Closest match merge

IL Gator: your join condition should have two parts. Let's say you have a table containing your ranges, with column names low and high.

low high
0 1500
1500 3000
3000 9000
etc.

In your primary table, you have a a column called density. In that case, the join conditions should be:

density >= low AND density < high

Sure, if you only had the first part to check density >= low, you'd get a lot of erroneous results. You need the AND with both conditions.
N/A
Posts: 0

Re: Closest match merge

Richard,
Is it possible to use AND in sas eg when you join tables?
SAS Employee
Posts: 149

Re: Closest match merge

Sure, the trick is in the Tables and Joins window, you'd drag LOW to DENSITY and set the operator, and drag HIGH to DENSITY and set the operator. AND is the default combination. I hadn't thought about this before, but I don't know if you could use an OR without modifying the coding by hand.

You made me think about this one -- thanks for helping me learn something new! Smiley Happy
Ask a Question
Discussion stats
  • 10 replies
  • 1089 views
  • 0 likes
  • 5 in conversation