Good evening SAS community.
I am very new to SAS and would love some help on developing some code. I'm sure this might be very simple for anyone with experience and I thank everyone in advance for sharing the knowledge.
I have a file that is broken out by company, year, country, product code and total sales. I created a very simple dummy file of what my real data looks like below. In short, I am trying to summarize the file so I get output for each company by year that shows me the total total sales that were in the US by company and that were outside of the US by company.
I would love the output to look like something like this:
Company | Year | Domestic - code1 | Domestic - code2 | Domestic - Code 3 | Foreign - Code 1 | Foreign - Code 2 | Foreign - Code 3 |
amazon | 2003 | 4053 | 39438 | 3438 | 39492 | 8382 | 83292 |
Here is what my data looks like. I included it in excel too in case that is more helpful!
Company | Year | Country | Code | Total |
Amazon | 2004 | USA | 1 | 142 |
Amazon | 2003 | USA | 2 | 172 |
Amazon | 2004 | USA | 2 | 178 |
Amazon | 2006 | USA | 1 | 158 |
Amazon | 2003 | USA | 3 | 177 |
Amazon | 2005 | Argentina | 4 | 146 |
Amazon | 2004 | Cuba | 3 | 182 |
Amazon | 2005 | England | 2 | 188 |
Amazon | 2006 | France | 1 | 110 |
Amazon | 2005 | Canada | 4 | 176 |
Amazon | 2005 | Mexico | 3 | 113 |
Sony | 2003 | USA | 2 | 184 |
Sony | 2003 | USA | 1 | 148 |
Sony | 2003 | USA | 3 | 177 |
Sony | 2005 | USA | 2 | 178 |
Sony | 2006 | USA | 1 | 184 |
Sony | 2005 | Argentina | 2 | 122 |
Sony | 2006 | Cuba | 4 | 129 |
Sony | 2006 | England | 2 | 109 |
Sony | 2003 | France | 3 | 103 |
Sony | 2004 | Canada | 3 | 158 |
Sony | 2005 | Mexico | 1 | 187 |
@anweinbe wrote:
I would like them to be easily understood. Exact names do not matter in most cases. It would be great for example if the company name did say “company”.
It looks like you should do this via PROC SUMMARY and PROC TRANSPOSE
proc summary data=have2 nway;
class company year type;
var total;
output out=sums sum=;
run;
proc transpose data=sums out=want;
by company year;
var total;
id type;
run;
proc transpose will do that trick.
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
At any rate, you'll need 2 passes through the data regardless of the method you choose. Below, the first pass is used to find the distinct names of the output summary variables for each [Company,Year]; then the second pass is used to aggregate directly into the output format you need.
data have ;
input Company :$16. Year Country:$16. Code Total ;
cards ;
Amazon 2004 USA 1 100
Amazon 2004 USA 1 142
Amazon 2003 USA 2 172
Amazon 2004 USA 2 178
Amazon 2006 USA 1 158
Amazon 2003 USA 3 177
Amazon 2005 Argentina 4 146
Amazon 2004 Cuba 3 182
Amazon 2005 England 2 188
Amazon 2006 France 1 110
Amazon 2005 Canada 4 176
Amazon 2005 Mexico 3 113
Sony 2003 USA 2 184
Sony 2003 USA 1 148
Sony 2003 USA 3 177
Sony 2005 USA 2 178
Sony 2006 USA 1 184
Sony 2005 Argentina 2 122
Sony 2006 Cuba 4 129
Sony 2006 England 2 109
Sony 2003 France 3 103
Sony 2004 Canada 3 158
Sony 2005 Mexico 1 187
;
run ;
%let dfc_expr = catx ("_", ifc (country="USA", "Domestic", "Foreign"), cats ("Code", code)) ;
proc sql noprint ;
select distinct &dfc_expr into :dfc separated by " " from have ;
quit ;
data _null_ ;
if _n_ = 1 then do ;
array ss &dfc ;
dcl hash h (ordered:"a") ;
h.definekey ("company", "year") ;
h.definedata ("company", "year") ;
do over ss ;
h.definedata (vname (ss)) ;
end ;
h.definedone() ;
end ;
set have end = end ;
if h.find() ne 0 then call missing (of ss[*]) ;
do over ss ;
if vname (ss) ne &dfc_expr then continue ;
ss ++ total ;
leave ;
end ;
h.replace() ;
if end then h.output (dataset:"want") ;
run ;
Note that the names of the summary variables are created in the pattern of Domestic_Code1, Foreign_Code2, and so forth. You can change it to your liking by changing the expression assigned to the macro variable DFC_EXPR. Also note that if some combination of Company, Year, and DFC is not in the input data, you'll get a missing value in that cell. If you want it to be a zero, change the code line with CALL MISSING to:
if h.find() ne 0 then do over ss ;
ss = 0 ;
end ;
Kind regards
Paul D.
/* UNTESTED CODE */
data have2;
set have;
length type $ 24;
if country='USA' then type=cats('Domestic-',code);
else type=cats('Foreign-',code);
run;
proc report data=have2;
columns company year type,total;
define company/group;
define year/group;
define type/across;
define total/analysis sum;
run;
Paige,
Thank you kindly! This appeared to do the trick.
One followup.... is there a way to take the company name and fill it for each record so I can see "amazon" next to each firm year?
type Domestic-1 Domestic-2 Domestic-3 Foreign-1 Foreign-2 Foreign-3 Foreign-4Company Year Total Total Total Total Total Total Total
Amazon | 2003 | . | 168 | . | . | . | . | 163 |
2004 | 194 | 164 | 165 | 158 | . | 332 | 100 | |
2006 | 166 | . | . | . | 145 | . | . |
@anweinbe wrote:
One followup.... is there a way to take the company name and fill it for each record so I can see "amazon" next to each firm year?
Not easily, as far as I know. Probably would involve PROC SUMMARY and PROC TRANSPOSE.
Do you want an output data set, or a printed/viewed report?
Ideally I plan to have it in an output file that can be brought into SPSS
@anweinbe wrote:
Ideally I plan to have it in an output file that can be brought into SPSS
I don't know what this means, your choices are to save the results in a SAS data set, or some printed/saved output file. Please specify one of the two.
A SAS dataset will be perfect 🙂
@anweinbe wrote:
A SAS dataset will be perfect 🙂
Do the variable names matter? What should they be?
@anweinbe wrote:
I would like them to be easily understood. Exact names do not matter in most cases. It would be great for example if the company name did say “company”.
It looks like you should do this via PROC SUMMARY and PROC TRANSPOSE
proc summary data=have2 nway;
class company year type;
var total;
output out=sums sum=;
run;
proc transpose data=sums out=want;
by company year;
var total;
id type;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.