I am trying to derive a new column (Column B) based on an existing column (Column A) on my data.
ColumnA CountofA
12345 15
12346 20
12347 6
12348 20
12349 4
12350 20
21100 6
21111 6
21112 4
21299 2
21399 2
Desired Output
ColumnA ColumnB
12345 12345
12346 12346-12347 (This will get merged since the first four letter are the same)
12347 12346-12347 (This will get merged since the first four letter are the same)
12348 12348-12349 (This will get merged since the first four letter are the same)
12349 12348-12349 (This will get merged since the first four letter are the same)
12350 12350
21101 21101-21111 (This will get merged since the first 3 digits are the same and their sum total is 12)
21111 21101-21111 (This will get merged since the first 3 digits are the same and their sum total is 12)
21112 21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎
21299 21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎
21399 21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎
The code that I currently involves lot of manual check and so I thought of posting my question over here to get guidance on solving above scenarios.
Thanks in advance
Even with your notes, I have no clue of what the business rule is.
Perhaps you could describe in text your real life scenario, that might help.
I don't know what you are talking about .
This could give you a start.
data have;
input ColumnA $ CountofA ;
cards;
12345 15
12346 20
12347 6
12348 20
12349 4
12350 20
21100 6
21111 6
21112 4
21299 2
21399 2
;
run;
proc sql;
create table want as
select a.*,b.ColumnA as b_ColumnA ,abs(compare(a.ColumnA ,b.ColumnA )) as distance
from have as a,have as b
where a.ColumnA ne b.ColumnA
group by a.ColumnA
having calculated distance=max(calculated distance)
order by a.ColumnA,b.ColumnA ;
quit;
Sorry for the poor description.
Here's what I am trying to achieve:
Basically, I have a table which contains unique 5 digit number and total record counts associated with it. Now the 5 digit numbers are not in sequence, however they could have either the first four digits or first three or first two digits in common. (01521, 01529). Now using this numbers, I want to roll up the date based on each number. So let's say if number 01521 have more than 10 records, however 01529 doesn't have 10 records, I would like to create a new column which will tell me which numbers where grouped together (for ex: 01521-01529). Here's are some sample records:
Column A # of Records
01111 111
01119 6
Now since the first four digits are common for above numbers, I would like to create a new column with value for each row as '01111-01119'.
Similarly, here is the second scenario where I have exhaused the first four combinations. So I would go for matching the first three digits :
Column A # of Records
01255 99
01299 6
Again I only need to do a grouping if the number of records are less than 10.
Here is the code that I have which definitely not the best piece of code:
data want;
merge test1
test1 (firstobs=2 rename=(columna=_columna1 count=_count1 flag=_flag1 ))
test1 (firstobs=3 rename=(columna=_columna2 count=_count2 flag=_flag2 ))
;
if count (# of records) < 10 then do;
if substr(columna,1,4) = substr(_columna1,1,4) then do;
if count + _count1 >= 10 then derived_column = strip(column) || '-' || strip(_columna1);
else if substr(column,1,4) = substr(_columna2,1,4) then do;
if count + _count1 + _count2 >= 10 then do;
derived_column = strip(column) || '-' || strip(_columna2);
end;
end;
The problem with above program is I am not able to derive the new column for records that have value of > 10.
HTH. Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.