I know I could create a computed column and use a case when statement. However, assume I wanted to split the densities into 100 pre-defined groups. The case when statement in this instance would be daunting.
If this were being done in excel, I would use a vlookup table and specify True for the range lookup, and it would return the closest value without going over. How can I do something similar in SAS EG? I tried using a table join, joining on density, and instead of using = I used <= or >= (tried both) but it didn't return the results that I was expecting.
Since I'm not so fond of using formats in DW processing, I would prefer a join approach. If haven't misunderstood, I think that a join criteria using BETWEEN from AND to would work, where from and to are the boundaries for your density.
In case you're using SAS datasets: What's so bad about using formats? They are in a way nothing else than lookup tables with 2 columns - and they are very easy to use.
One of the best things about formats: You wouldn't have to split your values and assign them to another variable - just assign the format to the original variable if needed.
With a lot of SAS Procedures you can choose for classification variables whether the formatted or unformatted value should be taken for calculations - and you can easily assign another format if needed (i.e. one with only 50 groups).
Any of the techniques mentioned should work, including GE and LE in the join criteria as you originally described. BETWEEN AND, as suggested, might simply what you're trying to do.
Be aware that if you have a very large data source, the join solution could potentially take a LONG time to execute. When you join on something other than equality, in many cases PROC SQL will be forced to create a full catesian product behind the scenes. There will be a note in the log if that is the case. If you had 100 pre-defined groups joining to 100,000 rows, the cartesian product would be 100*100,000=10,000,000 rows behind the scenes before the final results were isolated and returned!
my versions of EG is 4.1 and that program requires 4.2, so I can't use that.
I'm not sure how to use between as part of a join. Or would I have to make a computed column? If so, that's what I'm trying to avoid because I don't want to have to write case when a then b 100 times. When I used the GE and LE joins, it returned to many rows. since i.e. the value 1500 would fall into the category lt 3000, lt 9000 and lt max value.
Richard, you are right about the Cartesian product join. Funny that proc sql can't optimize this on, it's not rocket science...?
Another sql alternative is to have an inline view in the select statement. The downside here is that I don't think you can't point and click that one in EG.
You have to write it yourself, and then link the sub queried table manually if you want the dependency to be visible in the Project designer.
You have to create a calculated column using the Advanced expression builder. If you have built an interval table for you densities, your sub-query could look like this:
select densGroupDesc from denity_ranges where origtable.density between fromDens and toDens
Sure, the trick is in the Tables and Joins window, you'd drag LOW to DENSITY and set the operator, and drag HIGH to DENSITY and set the operator. AND is the default combination. I hadn't thought about this before, but I don't know if you could use an OR without modifying the coding by hand.
You made me think about this one -- thanks for helping me learn something new!