Hi Guys
I have the dataset 1 below:
I receive an excel sheet of about 60+ random numbers. How do I read in the sheet and link the unique numbers from EXCEL1 to DATASET 1 to get the account ID i.e OUTPUT 1?
DATASET 1 | |||
RANGE | MIN | MAX | ACCOUNT_ID |
High | 1 | 10 | A0001 |
High | 11 | 20 | A0002 |
Medium | 21 | 29 | A0003 |
Medium | 30 | 69 | A0004 |
Medium | 70 | 99 | A0005 |
Medium | 100 | 299 | A0006 |
Medium | 300 | 499 | A0007 |
Medium | 500 | 1000 | A0008 |
Low | 1001 | 1499 | A0009 |
Low | 1500 | 2000 | A0010 |
Very Low | 2001 | 6000 | A0011 |
Very Low | 6001 | 9000 | A0012 |
EXCEL1 | |||
NUMBERS | |||
10 | |||
2005 | |||
8000 |
OUTPUT1 | |||
NUMBER | ACCOUNT_ID | ||
10 | A0001 | ||
2005 | A0011 | ||
8000 | A0012 |
Many thanks
Hi, this works with the data you posted (assumes you'll have a match for all obs in data set SEARCH) ...
data x;
input range :$6. min max account :$5. @@;
datalines;
High 1 10 A0001 High 11 20 A0002 Medium 21 29 A0003 Medium 30 69 A0004
Medium 70 99 A0005 Medium 100 299 A0006 Medium 300 499 A0007 Medium 500 1000 A0008
Low 1001 1499 A0009 Low 1500 2000 A0010 VeryLow 2001 6000 A0011 VeryLow 6001 9000 A0012
;
data search;
input number @@;
datalines;
10 2005 8000
;
data y (keep=number account);
set search;
do until (number ge min and number le max);
set x;
end;
run;
data set Y ...
Obs number account
1 10 A0001
2 2005 A0011
3 8000 A0012
Use SQL join with a between - and criteria.
Presumably, you know how to read in the sheet. If not, that's a separate question.
I would create a format out of the min/max ranges:
data fmt;
fmtname='$acct';
set dataset1 end=done;
length label $ 7;
rename min=start max=end;
label=account;
output;
if done;
hlo='O';
label='Unknown';
output;
run;
proc format cntlin=fmt;
run;
Then it becomes easy to look up the values:
data want;
set excel1;
account_id = put(number, $acct.);
run;
It's untested, but should work. Good luck.
proc sql;
create table Want as
select A.Number, B.Account_Id from Numbers1 A left join Dataset1 B
on B.Min le A.Number le B.Max;
quit;
Thanks MadhuKorni
Would this be the most optimal way? The reason I am asking is that although the numbers table is small with 60 records the left join to the dataset1 table is around 40 million records?
Hi, try 40 million observations (<2 seconds) ..
data x;
do account=1 to 40e6;
min = 10*account;
max = min + 9;
output;
end;
run;
data search;
input number :comma. @@;
datalines;
10 2005 99,999,999 300,000,000 999,999,999
;
* won't find last number ... too big;
data y (keep=number account);
set search;
do until (number ge min and number le max);
set x;
end;
273 data y (keep=number account);
274 set search;
275 do until (number ge min and number le max);
276 set x;
277 end;
278
279 run;
NOTE: There were 5 observations read from the data set WORK.SEARCH.
NOTE: There were 40000000 observations read from the data set WORK.X.
NOTE: The data set WORK.Y has 4 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 1.38 seconds
cpu time 1.34 seconds
Hi, this works with the data you posted (assumes you'll have a match for all obs in data set SEARCH) ...
data x;
input range :$6. min max account :$5. @@;
datalines;
High 1 10 A0001 High 11 20 A0002 Medium 21 29 A0003 Medium 30 69 A0004
Medium 70 99 A0005 Medium 100 299 A0006 Medium 300 499 A0007 Medium 500 1000 A0008
Low 1001 1499 A0009 Low 1500 2000 A0010 VeryLow 2001 6000 A0011 VeryLow 6001 9000 A0012
;
data search;
input number @@;
datalines;
10 2005 8000
;
data y (keep=number account);
set search;
do until (number ge min and number le max);
set x;
end;
run;
data set Y ...
Obs number account
1 10 A0001
2 2005 A0011
3 8000 A0012
Mike,
Very nice approach. It can be expanded to cover additional situations. Perhaps, for example, two NUMBER values belong to the same account. You might go with:
data want;
set search;
if (min <= number <= max) then return;
do until (min <= number <= max);
set x;
end;
run;
And there is always the more complex possibility that you mentioned: What if one NUMBER does not have a match (a decimal fraction perhaps?) but the next NUMBER does have a match? WIthout sketching out all the details, it would probably be easiest to create two output data sets (one holding matches, one holding mismatches).
Hi ...
re duplicates in the SEARCH data set (if SEARCH in order) ...
data y (keep=number account);
set search;
if number ne lag(number);
do until (number ge min and number le max);
set x;
end;
run;
re "more complex possibility" ... life is full of them
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.