BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,
I have a dataset that contains zip codes and densities. Densities is a continuous variable. As an example, I want to split the densities into 6 groups:

lt 150
150 - 500
500 - 1000
1000 - 3000
3000 - 9000
gt 9000

I know I could create a computed column and use a case when statement. However, assume I wanted to split the densities into 100 pre-defined groups. The case when statement in this instance would be daunting.

If this were being done in excel, I would use a vlookup table and specify True for the range lookup, and it would return the closest value without going over. How can I do something similar in SAS EG? I tried using a table join, joining on density, and instead of using = I used <= or >= (tried both) but it didn't return the results that I was expecting.

Any suggestions?
10 REPLIES 10
deleted_user
Not applicable
Edit: sentence should say "instead of using = I used ge and le"
ChrisHemedinger
Community Manager
How about using a format?

And you could create a format from a data source that had all your ranges predefined, using this task:

http://blogs.sas.com/sasdummy/index.php?/archives/95-Creating-a-SAS-format-from-a-data-set.html

Chris
It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
LinusH
Tourmaline | Level 20
Since I'm not so fond of using formats in DW processing, I would prefer a join approach. If haven't misunderstood, I think that a join criteria using BETWEEN from AND to would work, where from and to are the boundaries for your density.

/Linus
Data never sleeps
Patrick
Opal | Level 21
In case you're using SAS datasets: What's so bad about using formats? They are in a way nothing else than lookup tables with 2 columns - and they are very easy to use.

One of the best things about formats: You wouldn't have to split your values and assign them to another variable - just assign the format to the original variable if needed.
With a lot of SAS Procedures you can choose for classification variables whether the formatted or unformatted value should be taken for calculations - and you can easily assign another format if needed (i.e. one with only 50 groups).
RichardH_sas
SAS Employee
Any of the techniques mentioned should work, including GE and LE in the join criteria as you originally described. BETWEEN AND, as suggested, might simply what you're trying to do.

Be aware that if you have a very large data source, the join solution could potentially take a LONG time to execute. When you join on something other than equality, in many cases PROC SQL will be forced to create a full catesian product behind the scenes. There will be a note in the log if that is the case. If you had 100 pre-defined groups joining to 100,000 rows, the cartesian product would be 100*100,000=10,000,000 rows behind the scenes before the final results were isolated and returned!
deleted_user
Not applicable
Chris,
my versions of EG is 4.1 and that program requires 4.2, so I can't use that.

Linus,
I'm not sure how to use between as part of a join. Or would I have to make a computed column? If so, that's what I'm trying to avoid because I don't want to have to write case when a then b 100 times. When I used the GE and LE joins, it returned to many rows. since i.e. the value 1500 would fall into the category lt 3000, lt 9000 and lt max value.

I'm not sure how to use formats as a lookup.
LinusH
Tourmaline | Level 20
Richard, you are right about the Cartesian product join. Funny that proc sql can't optimize this on, it's not rocket science...?
Another sql alternative is to have an inline view in the select statement. The downside here is that I don't think you can't point and click that one in EG.
You have to write it yourself, and then link the sub queried table manually if you want the dependency to be visible in the Project designer.

You have to create a calculated column using the Advanced expression builder. If you have built an interval table for you densities, your sub-query could look like this:

select densGroupDesc from denity_ranges where origtable.density between fromDens and toDens

/Linus
Data never sleeps
RichardH_sas
SAS Employee
IL Gator: your join condition should have two parts. Let's say you have a table containing your ranges, with column names low and high.

low high
0 1500
1500 3000
3000 9000
etc.

In your primary table, you have a a column called density. In that case, the join conditions should be:

density >= low AND density < high

Sure, if you only had the first part to check density >= low, you'd get a lot of erroneous results. You need the AND with both conditions.
deleted_user
Not applicable
Richard,
Is it possible to use AND in sas eg when you join tables?
RichardH_sas
SAS Employee
Sure, the trick is in the Tables and Joins window, you'd drag LOW to DENSITY and set the operator, and drag HIGH to DENSITY and set the operator. AND is the default combination. I hadn't thought about this before, but I don't know if you could use an OR without modifying the coding by hand.

You made me think about this one -- thanks for helping me learn something new! 🙂

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2585 views
  • 0 likes
  • 5 in conversation