Solved: Re: Proc SQL: Combine Multiple Row Values into One Column/Row

blakezen · Posted 06-16-2016 02:47 PM

Hi All,

I have a dataset and a simple request to combine two specific rows together, as follows:

Dataset: Loans_2016

Category Gross Net Reg Loss

home 300 200 50 25

RV 100 80 15 5

TDR 80 75 5 3

HL1 70 61 1 1

Bad_HL1 42 30 5 .

HL2 88 70 12 6

Bad_HL2 35 28 7 .

I just want to add Bad_HL1 into HL1 and add Bad_HL2 into HL2, (and drop those rows so that only home RV TDR HL1 HL2 are displayed.

I have done everything else to get my data but I can't figure out how to do this concatenation with the display I want to see in the end result. Could anyone please provide me with any code help you can?

Thank you very much!

PGStats · Posted 06-16-2016 05:06 PM

This works:

proc sql;
create table loans as
select case when upcase(Category) eqt "BAD_" then substr(Category,5) else Category end as Category,
sum(Gross) as Gross, sum(Net) as Net, sum(Reg) as Reg, sum(Loss) as Loss
from loans_2016
group by calculated Category;
select * from loans;
quit;

PG

View solution in original post

slchen · Posted 06-16-2016 03:46 PM

proc sql;
select distinct prxchange('s/\w+_//',1,category),sum(gross) as gross, sum(Net) as Net,sum(Reg) as Reg,sum(Loss) as Loss

from have

group by prxchange('s/\w+_//',1,category);
quit;

blakezen · Posted 06-16-2016 04:03 PM

Hi slchen,

thanks for your reply! after running the code unfortunately, it works great, but I can't get it to apply to my full expanded dataset (below). I see that you used the "_" as the key indicator to do the concatenation but can you advise on how to change your code when the "categoy" is expanded:

Full dataset:

Category Gross Net Reg Loss

FM_AR

FM_IO

HEIL

HEL_AM

HRND

HEL_IO_1

HR Balloon

LR Bast

Other_HEL IO

FM_TRD

HE_TRD

BAD_HEIL

BAD_HELOC_MO

RV

MAR_L

These are my full rep categories (gross through loss contain numeric values, so we can assume it's properly imported in SAS). Going off of your code, can you help me ammended such that BAD_HEIL is added into HEIL and Bad_HELOC_MO is added to HR Balloon? Then the rows with those BAD_ categories are eliminated from the dataset.

Sorry I did not have this expanded dataset originially, I thought I could build upon yours but I now see it's more complicated than it looks. Thank you very much.

blakezen · Posted 06-20-2016 10:23 AM

Thank you for your insight!

PGStats · Posted 06-16-2016 05:06 PM

This works:

proc sql;
create table loans as
select case when upcase(Category) eqt "BAD_" then substr(Category,5) else Category end as Category,
sum(Gross) as Gross, sum(Net) as Net, sum(Reg) as Reg, sum(Loss) as Loss
from loans_2016
group by calculated Category;
select * from loans;
quit;

PG

blakezen · Posted 06-16-2016 05:09 PM

thanks a lot PG! I'm trying thi snow, but... are you sure this adds Bad_HEIL to HEIL and BAD_HELOC_MO to HR Balloon? It seems like your code only subtracts the "BAD_" rows?

PGStats · Posted 06-16-2016 05:11 PM

Yes, I am sure

PG

blakezen · Posted 06-16-2016 05:14 PM

thanks!! 🙂 one last follow up though and I know this will come across as rather dumb, but... how? there is no reference to the targets "HEIL" or "HR Balloon" that I can spot! What magic is this? 🙂

PGStats · Posted 06-16-2016 05:18 PM

I remove BAD_ from the Category and group on what's left, i.e. one line with the original category name and the other with the left-truncated name.

PG

blakezen · Posted 06-16-2016 05:29 PM

awesome very cool; I got the first grouping! but I'm still not getting exactly how HR Balloon and BAD_HELOC_MO are grouped. A little lost on that one since I don't see the left-truncated name tying to HR Balloon, but it works! Maybe the fact that there are 5 rows in the between HR Balloon and BAD_HELOC_MO?

sorry for the many questions... Many thanks!

PGStats · Posted 06-16-2016 05:41 PM

When the Category name does not start wih BAD_, it is kept as is (this is the second clause of the case-expression). The initial order of categories does not matter. SQL does its own sort on the left-truncated names to perform the grouping.

The clause group by calculated Category refers to the truncated names, i.e. the result of the case-expression.

PG

blakezen · Posted 06-20-2016 10:24 AM

Thank you very much for your many insights and conceptual breakdown PG!

Ksharp · Posted 06-16-2016 09:33 PM

data have;
input Category $ Gross   Net     Reg     Loss;
cards;
home        300        200     50       25
RV            100        80       15       5
TDR          80          75       5        3
HL1           70          61       1       1
Bad_HL1   42         30        5       .      
HL2           88         70       12     6
Bad_HL2  35          28       7      .
;
run;

data want;
 merge have have(firstobs=2 rename=(
 Category=_C Gross=_G   Net=_N  Reg=_R Loss=_L));
if Category=:'HL' then do;
  Gross=sum(Gross,_G);  Net=sum(Net,_N);    
  Reg=sum(Reg,_R);    Loss=sum(Loss,_L);
end;
if Category=:'Bad_HL' then delete;
drop _:;
run;

blakezen · Posted 06-20-2016 10:25 AM

This works great and is straightforward, Thank you Xia!

RW9 · Posted 06-17-2016 06:01 AM

Sounds like a simple join to me:

data have;
  input Category $ Gross Net Reg Loss;
datalines;
home        300        200     50       25
RV            100        80       15       5
TDR          80          75       5        3
HL1           70          61       1       1
Bad_HL1   42         30        5       .      
HL2           88         70       12     6
Bad_HL2  35          28       7      .
;
run;

proc sql;
  create table WANT as
  select  A.CATEGORY,
          sum(A.GROSS,B.GROSS) as GROSS,
          sum(A.NET,B.NET) as NET,
          sum(A.REG,B.REG) as REG,
          sum(A.LOSS,B.LOSS) as LOSS
  from    (select * from HAVE where upcase(substr(CATEGORY,1,4)) ne "BAD_") A
  left join (select * from HAVE where upcase(substr(CATEGORY,1,4))="BAD_") B
  on      A.CATEGORY=substr(B.CATEGORY,4);
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!