This is probably a basic question, but I'm having trouble finding an answer when searching previously asked questions. I have a data set with a customer identifier and an employee count, and I'm attempting to create a macro that will assign data data ranges. I don't use macros often, and I'm having trouble creating one to accomplish this issue. I'm attempting to create a low range value, and a high range value, that I can then concatenate into a range. I'm doing this by every 100 employees from 0 to 30,000.
I'd appreciate any help you can provide. FYI, I'm using SAS EG version 6.1.
Data Sample
Customer_number | Employee_Count |
1240 | 4453 |
1241 | 279 |
1242 | 94 |
1243 | 392 |
1246 | 79 |
1248 | 225 |
1258 | 105 |
1263 | 80 |
1265 | 325 |
1267 | 330 |
1270 | 365 |
1272 | 368 |
1273 | 3531 |
1274 | 348 |
1276 | 193 |
Macro attempt:
%macro a;
%do i=0 %to 30000 by 100;
%do x=100 %to 30000 by 100;
data median2;
set median1;
if pbt_customer_ee > &i and pbt_customer_ee < &x then low_band = &i;
if pbt_customer_ee > &i and pbt_customer_ee < &x then high_band = &x;
run;
%end;
%end;
%mend;
Basic math instead?
data have;
input count;
cards;
4453
279
94
392
79
;
run;
data want;
set have;
lower_bound = int(count/100)*100;
upper_bound = lower_bound + 100;
range = catx('-', lower_bound, upper_bound);
run;
Results:
Obs | count | lower_bound | upper_bound | range |
---|---|---|---|---|
1 | 4453 | 4400 | 4500 | 4400-4500 |
2 | 279 | 200 | 300 | 200-300 |
3 | 94 | 0 | 100 | 0-100 |
4 | 392 | 300 | 400 | 300-400 |
5 | 79 | 0 | 100 | 0-100 |
@EricM0628 wrote:
Here is the sample data with the new value I'm trying to create from the employee count value. I'm not sure creating a data set with the values would work, as I'd still need to link the new data set to the original data, which I imagine would take a lot of case statements (or if then data steps). Unless you know a better way to link the data?
Customer_number Employee_Count New Value 1240 4453 4400 - 4500 1241 279 200 - 300 1242 94 0 - 100 1243 392 300 - 400 1246 79 0 - 100 1248 225 200 - 300 1258 105 100 - 200 1263 80 0 - 100 1265 325 300 - 400 1267 330 300 - 400 1270 365 300 - 400 1272 368 300 - 400 1273 3531 3500 - 3600 1274 348 300 - 400 1276 193 100 - 200
Don't use macros for this purpose! They're not needed, and they (in this case) will actually run 90,000 DATA steps. As you can imagine, that would take a while to run.
A much better approach would be to create a data set to be used with CNTLIN in PROC FORMAT.
If you had a data set with these ranges in it, would that allow you to logically complete the task, or is your task more complex than that?
1 - 100
101 - 200
201 - 300
...
29801 - 29900
29901 - 30000
To get a good answer, provide test data in the form of a datastep, and what the output should be. It definately is not the best method to use macro for such a task. There are specific procedures and methods within base sas which are built specifically to do such a thing. For instance, a proc means or summary by customer number can output min/max values in a tiny fraction of the time of one datastep, let alone one per group. Use Base SAS to process your data, only resort to macro code when it actually adds something other than obfuscation to the code.
First thing before attempting to use a macro variables or macro coding for anything is to get a version that works without any macro involvement. Then figure out the parts that can be replaced by the text substitution that macros do.
I your specific code you are going to create the data set median2 roughly 90,000 times AND the final result will be the one with the last value of the &I and &x variables.
I actually do not see what "range" you may actually be trying to create. It does not exactly help that you show two variables in example data whose names do not appear in the code. So we don't really know what values you are actually manipulating.
It helps to show what the result for your example input data should look like, both start and end and describe the rules. Code that does not work is sort of suboptimal in describing what the rules might be. For instance what, in words, would be the definition of "low range value" and "high range value"? Does that definition actually change for each customer? You current code would make it appear so. Intuitively "customer_number" is an identifier and those seldom, if ever, should be involved in arithmetic.
Grouping values is sometimes a job for custom formats.
Here is the sample data with the new value I'm trying to create from the employee count value. I'm not sure creating a data set with the values would work, as I'd still need to link the new data set to the original data, which I imagine would take a lot of case statements (or if then data steps). Unless you know a better way to link the data?
Customer_number | Employee_Count | New Value |
1240 | 4453 | 4400 - 4500 |
1241 | 279 | 200 - 300 |
1242 | 94 | 0 - 100 |
1243 | 392 | 300 - 400 |
1246 | 79 | 0 - 100 |
1248 | 225 | 200 - 300 |
1258 | 105 | 100 - 200 |
1263 | 80 | 0 - 100 |
1265 | 325 | 300 - 400 |
1267 | 330 | 300 - 400 |
1270 | 365 | 300 - 400 |
1272 | 368 | 300 - 400 |
1273 | 3531 | 3500 - 3600 |
1274 | 348 | 300 - 400 |
1276 | 193 | 100 - 200 |
Basic math instead?
data have;
input count;
cards;
4453
279
94
392
79
;
run;
data want;
set have;
lower_bound = int(count/100)*100;
upper_bound = lower_bound + 100;
range = catx('-', lower_bound, upper_bound);
run;
Results:
Obs | count | lower_bound | upper_bound | range |
---|---|---|---|---|
1 | 4453 | 4400 | 4500 | 4400-4500 |
2 | 279 | 200 | 300 | 200-300 |
3 | 94 | 0 | 100 | 0-100 |
4 | 392 | 300 | 400 | 300-400 |
5 | 79 | 0 | 100 | 0-100 |
@EricM0628 wrote:
Here is the sample data with the new value I'm trying to create from the employee count value. I'm not sure creating a data set with the values would work, as I'd still need to link the new data set to the original data, which I imagine would take a lot of case statements (or if then data steps). Unless you know a better way to link the data?
Customer_number Employee_Count New Value 1240 4453 4400 - 4500 1241 279 200 - 300 1242 94 0 - 100 1243 392 300 - 400 1246 79 0 - 100 1248 225 200 - 300 1258 105 100 - 200 1263 80 0 - 100 1265 325 300 - 400 1267 330 300 - 400 1270 365 300 - 400 1272 368 300 - 400 1273 3531 3500 - 3600 1274 348 300 - 400 1276 193 100 - 200
This worked perfectly. Thanks for your help!
data want;
set have;
length new_value $ 16;
lower = 100 * floor(employee_count/100);
upper = 100 * ceil(employee_count/100);
new_value = catx('-',lower,upper);
run;
As you can see, no need for macros, and no need for looping.
I do wonder how this new_value is going to be used in any meaningful way, usually there's no need to create such a text string in SAS for additional analysis purposes or for reporting purposes.
Your range boundaries as shown would imply to me that if you have a value of exactly 4400 you need to choose between 4300-4400 and 4400-4500 values. So what is your boundary value rule?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.