BookmarkSubscribeRSS Feed
Pritish
Quartz | Level 8

I am trying to derive a new column (Column B) based on an existing column (Column A) on my data.

 

 

ColumnA CountofA 

12345         15         

12346         20         

12347         6           

12348         20         

12349        4  

12350        20

21100        6

21111         6

21112       4

21299      2

21399      2

 

 

Desired Output  

ColumnA ColumnB

12345        12345       

12346        12346-12347 (This will get merged since the first four letter are the same) 

12347        12346-12347 (This will get merged since the first four letter are the same)  

12348        12348-12349 (This will get merged since the first four letter are the same) 

12349        12348-12349 (This will get merged since the first four letter are the same)    

12350       12350 

21101         21101-21111 (This will get merged since the first 3 digits are the same and their sum total is 12) 

21111         21101-21111 (This will get merged since the first 3 digits are the same and their sum total is 12) 

21112        21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎

21299        21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎

21399        21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎

 

 

The code that I currently involves lot of manual check and so I thought of posting my question over here to get guidance on solving above scenarios. 

 

Thanks in advance

3 REPLIES 3
LinusH
Tourmaline | Level 20

Even with your notes, I have no clue of what the business rule is.

Perhaps you could describe in text your real life scenario, that might help.

Data never sleeps
Ksharp
Super User

I don't know what you are talking about .

This could give you a start.

 

 

data have;
input ColumnA $  CountofA ;
cards;
12345         15         
12346         20         
12347         6           
12348         20         
12349        4  
12350        20
21100        6
21111         6
21112       4
21299      2
21399      2
;
run;
proc sql;
create table want as
 select a.*,b.ColumnA as b_ColumnA ,abs(compare(a.ColumnA ,b.ColumnA )) as distance
  from have as a,have as b
   where a.ColumnA ne b.ColumnA 
    group by a.ColumnA
     having calculated distance=max(calculated distance)
      order by a.ColumnA,b.ColumnA  ;
quit;
Pritish
Quartz | Level 8

Sorry for the poor description. 

Here's what I am trying to achieve: 

Basically, I have a table which contains unique 5 digit number and total record counts associated with it. Now the 5 digit numbers are not in sequence, however they could have either the first four digits or first three or first two digits in  common. (01521, 01529). Now using this numbers, I want to roll up the date based on each number. So let's say if number 01521 have more than 10 records, however 01529 doesn't have 10 records, I would like to create a new column which will tell me which numbers where grouped together (for ex: 01521-01529). Here's are some sample records: 

Column A  # of Records

01111             111 

01119              6 

Now since the first four digits are common for above numbers, I would like to create a new column with value for each row as '01111-01119'. 

Similarly, here is the second scenario where I have exhaused the first four combinations. So I would go for matching the first three digits : 

Column A # of Records

01255          99

01299           6

 

Again I only need to do a grouping if the number of records are less than 10. 

Here is the code that I have which definitely not the best piece of code: 

 

data want;
merge test1
test1 (firstobs=2 rename=(columna=_columna1 count=_count1 flag=_flag1 ))
test1 (firstobs=3 rename=(columna=_columna2 count=_count2 flag=_flag2 ))
;


if count (# of records) < 10 then do;

 

if substr(columna,1,4) = substr(_columna1,1,4) then do;


if count + _count1 >= 10 then derived_column = strip(column) || '-' || strip(_columna1);

 

else if substr(column,1,4) = substr(_columna2,1,4) then do;
if count + _count1 + _count2 >= 10 then do;
derived_column = strip(column) || '-' || strip(_columna2);
end;


end;

 

The problem with above program is I am not able to derive the new column for records that have value of > 10.

 

 

HTH. Thanks!

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1270 views
  • 0 likes
  • 3 in conversation