BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ssulli11
Calcite | Level 5

Good afternoon All!

 

I am attempting to sum a variable using multiple conditions. See below for the data I have and want.  I want to create two new variables that sum "supply" by "ID" based on "year" and "AorB" group.  I am aware of how to complete this using proc sql when looking at only two other variables (ID and Year), but not sure how to complete this data step when using multiple conditions. Your help and expertise is always appreciated!!

 

Data Have:

 

IDAorBYearSupply
15169A200930
15169A200960
15169A200930
15169A200990
15169A201030
15169A201030
15169B201030
15169A201030
30629A200990
30629A200990
30629A200930
30629B200930
30629B200930
30629A201090
30629A201190

 

Data Want:

IDAorBYearSupplyYear_Supply_AYear_Supply_B
15169A2009302100
15169A2009602100
15169A2009302100
15169A2009902100
15169A2010309030
15169A2010309030
15169B2010309030
15169A2010309030
30629A20099021060
30629A20099021060
30629A20093021060
30629B20093021060
30629B20093021060
30629A201090900
30629A201190900

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Use proc sql, and take advantage of auto remerge:

 

proc sql;
create table want as
select 
    *,
    sum(case when AorB = "A" then Supply else 0 end) as Year_Supply_A,
    sum(case when AorB = "B" then Supply else 0 end) as Year_Supply_B
from have
group by ID, Year;
quit;
PG

View solution in original post

6 REPLIES 6
Reeza
Super User

There's probably a quicker way than this, but this is dynamic until the last step where I'm replacing the zeros. There are ways to do that if necessary. 

 

1. Summarize data

2. Transpose into different columns

3. Merge into main data

4. Replace missing with 0

 

data have;
input ID $ AorB $   Year    Supply;
cards;
15169   A   2009    30
15169   A   2009    60
15169   A   2009    30
15169   A   2009    90
15169   A   2010    30
15169   A   2010    30
15169   B   2010    30
15169   A   2010    30
30629   A   2009    90
30629   A   2009    90
30629   A   2009    30
30629   B   2009    30
30629   B   2009    30
30629   A   2010    90
30629   A   2011    90
;;;;
run;

proc means data=have noprint nway;
class ID Year AorB;
var Supply;
output out=summary sum=Total;
run;

proc transpose data=summary out=summary_wide prefix=Supply;
by ID Year;
ID AorB;
var Total;
run;

proc sql;
create table want as 
select a.*, coalesce(SupplyA, 0) as SupplyA, coalesce(SupplyB, 0) as SupplyB
from have as a
left join summary_wide as b
on a.ID=b.ID and a.Year=b.Year;
quit;

 


@ssulli11 wrote:

Good afternoon All!

 

I am attempting to sum a variable using multiple conditions. See below for the data I have and want.  I want to create two new variables that sum "supply" by "ID" based on "year" and "AorB" group.  I am aware of how to complete this using proc sql when looking at only two other variables (ID and Year), but not sure how to complete this data step when using multiple conditions. Your help and expertise is always appreciated!!

 

Data Have:

 

ID AorB Year Supply
15169 A 2009 30
15169 A 2009 60
15169 A 2009 30
15169 A 2009 90
15169 A 2010 30
15169 A 2010 30
15169 B 2010 30
15169 A 2010 30
30629 A 2009 90
30629 A 2009 90
30629 A 2009 30
30629 B 2009 30
30629 B 2009 30
30629 A 2010 90
30629 A 2011 90

 

Data Want:

ID AorB Year Supply Year_Supply_A Year_Supply_B
15169 A 2009 30 210 0
15169 A 2009 60 210 0
15169 A 2009 30 210 0
15169 A 2009 90 210 0
15169 A 2010 30 90 30
15169 A 2010 30 90 30
15169 B 2010 30 90 30
15169 A 2010 30 90 30
30629 A 2009 90 210 60
30629 A 2009 90 210 60
30629 A 2009 30 210 60
30629 B 2009 30 210 60
30629 B 2009 30 210 60
30629 A 2010 90 90 0
30629 A 2011 90 90 0

 


 

PGStats
Opal | Level 21

Use proc sql, and take advantage of auto remerge:

 

proc sql;
create table want as
select 
    *,
    sum(case when AorB = "A" then Supply else 0 end) as Year_Supply_A,
    sum(case when AorB = "B" then Supply else 0 end) as Year_Supply_B
from have
group by ID, Year;
quit;
PG
mkeintz
PROC Star

If your data are  sorted by ID/YEAR, then you can make a single data step work:

 

data have;
  input ID	AorB :$1.	Year	Supply;
datalines;
15169	A	2009	30
15169	A	2009	60
15169	A	2009	30
15169	A	2009	90
15169	A	2010	30
15169	A	2010	30
15169	B	2010	30
15169	A	2010	30
30629	A	2009	90
30629	A	2009	90
30629	A	2009	30
30629	B	2009	30
30629	B	2009	30
30629	A	2010	90
30629	A	2011	90
run;

data want;
  set have;
  by id year;
  if first.year then call missing(supply_a,supply_b);
  if aorb='A' then supply_a+supply; else
  if aorb='B' then supply_b+supply;
  if last.year;
  supply_a=sum(0,supply_a);
  supply_b=sum(0,supply_b);
  do until (last.year);
    set have;
    by id year;
    output;
  end;
run;

Notice that supply_a and supplly_b are aggregated until encountering the last obs for a given id/year.  The those obs are then re-read and output with the correct supply_a and supply_b values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

I prefer the logically simpler (IMHO):

 

data want;
supply_a = 0;
supply_b = 0;
do until (last.year);
  set have; by id year;
  if aorb='A' then supply_a = supply_a + supply;
  if aorb='B' then supply_b = supply_b + supply;
  end;  
do until (last.year);
  set have; by id year;
  output;
  end;
run;
PG
Ksharp
Super User

Similar with PG's code.

 

data have;
input ID $ AorB $   Year    Supply;
cards;
15169   A   2009    30
15169   A   2009    60
15169   A   2009    30
15169   A   2009    90
15169   A   2010    30
15169   A   2010    30
15169   B   2010    30
15169   A   2010    30
30629   A   2009    90
30629   A   2009    90
30629   A   2009    30
30629   B   2009    30
30629   B   2009    30
30629   A   2010    90
30629   A   2011    90
;;;;
run;
proc sql;
select 
    *,
    coalesce((select sum(Supply) from have where id=a.id and year=a.year and AorB = "A"),0) as Year_Supply_A,
    coalesce((select sum(Supply) from have where id=a.id and year=a.year and AorB = "B"),0) as Year_Supply_B
from have as a;
quit;
ssulli11
Calcite | Level 5

Thank you to everyone who responded and helped!!  

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 3304 views
  • 2 likes
  • 5 in conversation