BookmarkSubscribeRSS Feed
Pritish
Quartz | Level 8

I am trying to derive a new column (Column B) based on an existing column (Column A) on my data.

 

 

ColumnA CountofA 

12345         15         

12346         20         

12347         6           

12348         20         

12349        4  

12350        20

21100        6

21111         6

21112       4

21299      2

21399      2

 

 

Desired Output  

ColumnA ColumnB

12345        12345       

12346        12346-12347 (This will get merged since the first four letter are the same) 

12347        12346-12347 (This will get merged since the first four letter are the same)  

12348        12348-12349 (This will get merged since the first four letter are the same) 

12349        12348-12349 (This will get merged since the first four letter are the same)    

12350       12350 

21101         21101-21111 (This will get merged since the first 3 digits are the same and their sum total is 12) 

21111         21101-21111 (This will get merged since the first 3 digits are the same and their sum total is 12) 

21112        21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎

21299        21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎

21399        21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎

 

 

The code that I currently involves lot of manual check and so I thought of posting my question over here to get guidance on solving above scenarios. 

 

Thanks in advance

3 REPLIES 3
LinusH
Tourmaline | Level 20

Even with your notes, I have no clue of what the business rule is.

Perhaps you could describe in text your real life scenario, that might help.

Data never sleeps
Ksharp
Super User

I don't know what you are talking about .

This could give you a start.

 

 

data have;
input ColumnA $  CountofA ;
cards;
12345         15         
12346         20         
12347         6           
12348         20         
12349        4  
12350        20
21100        6
21111         6
21112       4
21299      2
21399      2
;
run;
proc sql;
create table want as
 select a.*,b.ColumnA as b_ColumnA ,abs(compare(a.ColumnA ,b.ColumnA )) as distance
  from have as a,have as b
   where a.ColumnA ne b.ColumnA 
    group by a.ColumnA
     having calculated distance=max(calculated distance)
      order by a.ColumnA,b.ColumnA  ;
quit;
Pritish
Quartz | Level 8

Sorry for the poor description. 

Here's what I am trying to achieve: 

Basically, I have a table which contains unique 5 digit number and total record counts associated with it. Now the 5 digit numbers are not in sequence, however they could have either the first four digits or first three or first two digits in  common. (01521, 01529). Now using this numbers, I want to roll up the date based on each number. So let's say if number 01521 have more than 10 records, however 01529 doesn't have 10 records, I would like to create a new column which will tell me which numbers where grouped together (for ex: 01521-01529). Here's are some sample records: 

Column A  # of Records

01111             111 

01119              6 

Now since the first four digits are common for above numbers, I would like to create a new column with value for each row as '01111-01119'. 

Similarly, here is the second scenario where I have exhaused the first four combinations. So I would go for matching the first three digits : 

Column A # of Records

01255          99

01299           6

 

Again I only need to do a grouping if the number of records are less than 10. 

Here is the code that I have which definitely not the best piece of code: 

 

data want;
merge test1
test1 (firstobs=2 rename=(columna=_columna1 count=_count1 flag=_flag1 ))
test1 (firstobs=3 rename=(columna=_columna2 count=_count2 flag=_flag2 ))
;


if count (# of records) < 10 then do;

 

if substr(columna,1,4) = substr(_columna1,1,4) then do;


if count + _count1 >= 10 then derived_column = strip(column) || '-' || strip(_columna1);

 

else if substr(column,1,4) = substr(_columna2,1,4) then do;
if count + _count1 + _count2 >= 10 then do;
derived_column = strip(column) || '-' || strip(_columna2);
end;


end;

 

The problem with above program is I am not able to derive the new column for records that have value of > 10.

 

 

HTH. Thanks!

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1691 views
  • 0 likes
  • 3 in conversation