Hi. I’m having trouble geting some group by processing syntax correct.
I’m trying to calculate two variables one for first week revenue and one for Last week revenue. I'm not getting the correct looking output.
Here is what I have for my program.
proc sort data= Campaigns_all;
by type Campaign_Name week_of;
run;
data test (keep=type Campaign_Name start_dc end_dc first_week_rev last_week_rev );
set Campaigns_all;
by type Campaign_Name ;
if first.Campaign_Name then first_week_rev = Revenue;
if last.Campaign_Name then last_week_rev = Revenue;
output ;
run;
I start out with a data set which looks like this.
Type | campaign_name | start_dc | end_dc | week_of | Revenue | visits | Orders | Units |
Desktop | 1 Piece | 15-Sep-15 | 19-Oct-15 | 6-Oct-15 | $5,461.38 | 662 | 33 | 197 |
Desktop | 1 Piece | 15-Sep-15 | 19-Oct-15 | 13-Oct-15 | $3,712.08 | 293 | 10 | 101 |
Desktop | Get Dressed Up | 4-Aug-15 | 19-Oct-15 | 6-Oct-15 | $1,481.36 | 346 | 12 | 44 |
Desktop | Get Dressed Up | 4-Aug-15 | 19-Oct-15 | 13-Oct-15 | $1,898.78 | 149 | 15 | 85 |
After I run my I get this. Notice the null values in First. last
Type | campaign_name | start_dc | end_dc | first_week_rev | last_week_rev |
Desktop | 1 Piece | 15-Sep-15 | 19-Oct-15 | 5461.38 | . |
Desktop | 1 Piece | 15-Sep-15 | 19-Oct-15 | . | 3712.08 |
Desktop | Get Dressed Up | 4-Aug-15 | 19-Oct-15 | 1481.36 | . |
Desktop | Get Dressed Up | 4-Aug-15 | 19-Oct-15 | . | 1898.78 |
What I want one row for each campain.
Type | campaign_name | start_dc | end_dc | first_week_rev | last_week_rev |
Desktop | 1 Piece | 15-Sep-15 | 19-Oct-15 | 5461.38 | 3712.08 |
Desktop | Get Dressed Up | 4-Aug-15 | 19-Oct-15 | 1481.36 | 1898.78 |
Try this:
proc sort data= Campaigns_all;
by type Campaign_Name week_of;
run;
data test (keep=type Campaign_Name start_dc end_dc first_week_rev last_week_rev );
retain first_week_rev last_week_rev;
set Campaigns_all;
by type Campaign_Name ;
if first.Campaign_Name then first_week_rev = Revenue;
if last.Campaign_Name then do;
last_week_rev = Revenue;
output ;
end;
run;
CatCol
You can get the results using PROC SQL or PROC SUMMARY (and probably other procedures as well)
proc summary nway data=whatever;
var first_week_rev last_week_rev;
class type campaign_name start_dc end_dc;
output out=_final_ sum=;
run;
Try this:
proc sort data= Campaigns_all;
by type Campaign_Name week_of;
run;
data test (keep=type Campaign_Name start_dc end_dc first_week_rev last_week_rev );
retain first_week_rev last_week_rev;
set Campaigns_all;
by type Campaign_Name ;
if first.Campaign_Name then first_week_rev = Revenue;
if last.Campaign_Name then do;
last_week_rev = Revenue;
output ;
end;
run;
CatCol
You'll want to retain the value of first_week_rev throughout each by group. You'll also only want to output when at the end of the Campaign_Name by group. Something like below.
data test (keep=type Campaign_Name start_dc end_dc first_week_rev last_week_rev ); set Campaigns_all; by type Campaign_Name ; retain first_week_rev; if first.Campaign_Name then first_week_rev = Revenue; if last.Campaign_Name then do; last_week_rev = Revenue; output; end; run;
Similarly, you could dynamically create the start_dc and end_dc columns in this same data step (assuming the week_of variable determines those values)
data test (keep=type Campaign_Name start_dc end_dc first_week_rev last_week_rev ); set Campaigns_all(drop=start_dc end_dc); by type Campaign_Name ; retain first_week_rev start_dc; if first.Campaign_Name then do; first_week_rev = Revenue; start_dc = week_of end; if last.Campaign_Name then do; last_week_rev = Revenue; end_dc = week_of output; end; format start_dc end_dc mmddyyd8.; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.