BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
blakezen
Obsidian | Level 7

Hi All,

 

I have a dataset and a simple request to combine two specific rows together, as follows:

 

Dataset: Loans_2016

 

Category  Gross   Net     Reg     Loss

home        300        200     50       25

RV            100        80       15       5

TDR          80          75       5        3

HL1           70          61       1       1

Bad_HL1   42         30        5       .      

HL2           88         70       12     6

Bad_HL2  35          28       7      .

 

I just want to add Bad_HL1 into HL1 and add Bad_HL2 into HL2, (and drop those rows so that only home RV TDR HL1 HL2 are displayed.

 

I have done everything else to get my data but I can't figure out how to do this concatenation with the display I want to see in the end result. Could anyone please provide me with any code help you can?

 

Thank you very much!

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This works:

 

proc sql;
create table loans as
select case when upcase(Category) eqt "BAD_" then substr(Category,5) else Category end as Category,
sum(Gross) as Gross, sum(Net) as Net, sum(Reg) as Reg, sum(Loss) as Loss
from loans_2016
group by calculated Category;
select * from loans;
quit;
PG

View solution in original post

15 REPLIES 15
slchen
Lapis Lazuli | Level 10


proc sql;
select distinct prxchange('s/\w+_//',1,category),sum(gross) as gross, sum(Net) as Net,sum(Reg) as Reg,sum(Loss) as Loss

from have

group by prxchange('s/\w+_//',1,category);
quit;

blakezen
Obsidian | Level 7

Hi slchen,

thanks for your reply! after running the code unfortunately, it works great, but I can't get it to apply to my full expanded dataset (below). I see that you used the "_" as the key indicator to do the concatenation but can you advise on how to change your code when the "categoy" is expanded:

Full dataset:

 

Category Gross Net Reg Loss

FM_AR     

FM_IO

HEIL

HEL_AM

HRND

HEL_IO_1

HR Balloon

LR Bast

Other_HEL IO

FM_TRD

HE_TRD

BAD_HEIL

BAD_HELOC_MO

RV

MAR_L

 

These are my full rep categories (gross through loss contain numeric values, so we can assume it's properly imported in SAS). Going off of your code, can you help me ammended such that BAD_HEIL is added into HEIL and Bad_HELOC_MO is added to HR Balloon? Then the rows with those BAD_ categories are eliminated from the dataset.

 

Sorry I did not have this expanded dataset originially, I thought I could build upon yours but I now see it's more complicated than it looks. Thank you very much.

blakezen
Obsidian | Level 7

Thank you for your insight!

PGStats
Opal | Level 21

This works:

 

proc sql;
create table loans as
select case when upcase(Category) eqt "BAD_" then substr(Category,5) else Category end as Category,
sum(Gross) as Gross, sum(Net) as Net, sum(Reg) as Reg, sum(Loss) as Loss
from loans_2016
group by calculated Category;
select * from loans;
quit;
PG
blakezen
Obsidian | Level 7
thanks a lot PG! I'm trying thi snow, but... are you sure this adds Bad_HEIL to HEIL and BAD_HELOC_MO to HR Balloon? It seems like your code only subtracts the "BAD_" rows?
PGStats
Opal | Level 21

Yes, I am sure Smiley Happy

PG
blakezen
Obsidian | Level 7
thanks!! 🙂 one last follow up though and I know this will come across as rather dumb, but... how? there is no reference to the targets "HEIL" or "HR Balloon" that I can spot! What magic is this? 🙂
PGStats
Opal | Level 21

I remove BAD_ from the Category and group on what's left, i.e. one line with the original category name and the other with the left-truncated name.

PG
blakezen
Obsidian | Level 7
awesome very cool; I got the first grouping! but I'm still not getting exactly how HR Balloon and BAD_HELOC_MO are grouped. A little lost on that one since I don't see the left-truncated name tying to HR Balloon, but it works! Maybe the fact that there are 5 rows in the between HR Balloon and BAD_HELOC_MO?

sorry for the many questions... Many thanks!
PGStats
Opal | Level 21

When the Category name does not start wih BAD_, it is kept as is (this is the second clause of the case-expression). The initial order of categories does not matter. SQL does its own sort on the left-truncated names to perform the grouping.

 

The clause group by calculated Category refers to the truncated names, i.e. the result of the case-expression.

PG
blakezen
Obsidian | Level 7

Thank you very much for your many insights and conceptual breakdown PG!

Ksharp
Super User
data have;
input Category $ Gross   Net     Reg     Loss;
cards;
home        300        200     50       25
RV            100        80       15       5
TDR          80          75       5        3
HL1           70          61       1       1
Bad_HL1   42         30        5       .      
HL2           88         70       12     6
Bad_HL2  35          28       7      .
;
run;

data want;
 merge have have(firstobs=2 rename=(
 Category=_C Gross=_G   Net=_N  Reg=_R Loss=_L));
if Category=:'HL' then do;
  Gross=sum(Gross,_G);  Net=sum(Net,_N);    
  Reg=sum(Reg,_R);    Loss=sum(Loss,_L);
end;
if Category=:'Bad_HL' then delete;
drop _:;
run;
blakezen
Obsidian | Level 7

This works great and is straightforward, Thank you Xia!

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sounds like a simple join to me:

data have;
  input Category $ Gross Net Reg Loss;
datalines;
home        300        200     50       25
RV            100        80       15       5
TDR          80          75       5        3
HL1           70          61       1       1
Bad_HL1   42         30        5       .      
HL2           88         70       12     6
Bad_HL2  35          28       7      .
;
run;

proc sql;
  create table WANT as
  select  A.CATEGORY,
          sum(A.GROSS,B.GROSS) as GROSS,
          sum(A.NET,B.NET) as NET,
          sum(A.REG,B.REG) as REG,
          sum(A.LOSS,B.LOSS) as LOSS
  from    (select * from HAVE where upcase(substr(CATEGORY,1,4)) ne "BAD_") A
  left join (select * from HAVE where upcase(substr(CATEGORY,1,4))="BAD_") B
  on      A.CATEGORY=substr(B.CATEGORY,4);
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 2749 views
  • 0 likes
  • 5 in conversation