BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
anweinbe
Quartz | Level 8

Good evening SAS community.

 

I am very new to SAS and would love some help on developing some code. I'm sure this might be very simple for anyone with experience and I thank everyone in advance for sharing the knowledge.

 

I have a file that is broken out by company, year, country, product code and total sales. I created a very simple dummy file of what my real data looks like below. In short, I am trying to summarize the file so I get output for each company by year that shows me the total total sales that were in the US by company and that were outside of the US by company.

 

I would love the output to look like something like this:

 

CompanyYearDomestic - code1Domestic - code2Domestic - Code 3Foreign - Code 1Foreign - Code 2Foreign - Code 3
amazon2003405339438343839492838283292

 

 

Here is what my data looks like. I included it in excel too in case that is more helpful!

CompanyYearCountryCodeTotal
Amazon2004USA1142
Amazon2003USA2172
Amazon2004USA2178
Amazon2006USA1158
Amazon2003USA3177
Amazon2005Argentina4146
Amazon2004Cuba3182
Amazon2005England2188
Amazon2006France1110
Amazon2005Canada4176
Amazon2005Mexico3113
Sony2003USA2184
Sony2003USA1148
Sony2003USA3177
Sony2005USA2178
Sony2006USA1184
Sony2005Argentina2122
Sony2006Cuba4129
Sony2006England2109
Sony2003France3103
Sony2004Canada3158
Sony2005Mexico1187
1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

@anweinbe wrote:
I would like them to be easily understood. Exact names do not matter in most cases. It would be great for example if the company name did say “company”.

It looks like you should do this via PROC SUMMARY and PROC TRANSPOSE 

 

proc summary data=have2 nway;
    class company year type;
    var total;
    output out=sums sum=;
run;

proc transpose data=sums out=want;
   by company year;
   var total;
   id type;
run;
--
Paige Miller

View solution in original post

11 REPLIES 11
hashman
Ammonite | Level 13

@anweinbe:

At any rate, you'll need 2 passes through the data regardless of the method you choose. Below, the first pass is used to find the distinct names of the output summary variables for each [Company,Year]; then the second pass is used to aggregate directly into the output format you need. 

data have ;                                                                                                                             
  input Company :$16. Year Country:$16. Code Total ;                                                                                    
  cards ;                                                                                                                               
Amazon  2004  USA        1  100                                                                                                         
Amazon  2004  USA        1  142                                                                                                         
Amazon  2003  USA        2  172                                                                                                         
Amazon  2004  USA        2  178                                                                                                         
Amazon  2006  USA        1  158                                                                                                         
Amazon  2003  USA        3  177                                                                                                         
Amazon  2005  Argentina  4  146                                                                                                         
Amazon  2004  Cuba       3  182                                                                                                         
Amazon  2005  England    2  188                                                                                                         
Amazon  2006  France     1  110                                                                                                         
Amazon  2005  Canada     4  176                                                                                                         
Amazon  2005  Mexico     3  113                                                                                                         
Sony    2003  USA        2  184                                                                                                         
Sony    2003  USA        1  148                                                                                                         
Sony    2003  USA        3  177                                                                                                         
Sony    2005  USA        2  178                                                                                                         
Sony    2006  USA        1  184                                                                                                         
Sony    2005  Argentina  2  122                                                                                                         
Sony    2006  Cuba       4  129                                                                                                         
Sony    2006  England    2  109                                                                                                         
Sony    2003  France     3  103                                                                                                         
Sony    2004  Canada     3  158                                                                                                         
Sony    2005  Mexico     1  187                                                                                                         
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
%let dfc_expr = catx ("_", ifc (country="USA", "Domestic", "Foreign"), cats ("Code", code)) ;                                           
                                                                                                                                        
proc sql noprint ;                                                                                                                      
  select distinct &dfc_expr into :dfc separated by " " from have ;                                                                      
quit ;                                                                                                                                  
                                                                                                                                        
data _null_ ;                                                                                                                           
  if _n_ = 1 then do ;                                                                                                                  
    array ss &dfc ;                                                                                                                     
    dcl hash h (ordered:"a") ;                                                                                                          
    h.definekey  ("company", "year") ;                                                                                                  
    h.definedata ("company", "year") ;                                                                                                  
    do over ss ;                                                                                                                        
      h.definedata (vname (ss)) ;                                                                                                       
    end ;                                                                                                                               
    h.definedone() ;                                                                                                                    
  end ;                                                                                                                                 
  set have end = end ;                                                                                                                  
  if h.find() ne 0 then call missing (of ss[*]) ;                                                                                       
  do over ss ;                                                                                                                          
    if vname (ss) ne &dfc_expr then continue ;                                                                                          
    ss ++ total ;                                                                                                                       
    leave ;                                                                                                                             
  end ;                                                                                                                                 
  h.replace() ;                                                                                                                         
  if end then h.output (dataset:"want") ;                                                                                               
run ;                                              

Note that the names of the summary variables are created in the pattern of Domestic_Code1, Foreign_Code2, and so forth. You can change it to your liking by changing the expression assigned to the macro variable DFC_EXPR. Also note that if some combination of Company, Year, and DFC is not in the input data, you'll get a missing value in that cell. If you want it to be a zero, change the code line with CALL MISSING to:

  if h.find() ne 0 then do over ss ;                                                                                                    
    ss = 0 ;                                                                                                                            
  end ;     

Kind regards

Paul D.   

PaigeMiller
Diamond | Level 26
/* UNTESTED CODE */

data have2;
    set have;
    length type $ 24;
    if country='USA' then type=cats('Domestic-',code);
    else type=cats('Foreign-',code);
run;
proc report data=have2;
    columns company year type,total;
    define company/group;
    define year/group;
    define type/across;
    define total/analysis sum;
run;
--
Paige Miller
anweinbe
Quartz | Level 8

Paige,

 

Thank you kindly! This appeared to do the trick.

 

One followup.... is there a way to take the company name and fill it for each record so I can see "amazon" next to each firm year?

 

 

  type  Domestic-1 Domestic-2 Domestic-3 Foreign-1 Foreign-2 Foreign-3 Foreign-4Company Year Total Total Total Total Total Total Total

Amazon2003.168....163
 2004194164165158.332100
 2006166...145..
PaigeMiller
Diamond | Level 26

@anweinbe wrote:

One followup.... is there a way to take the company name and fill it for each record so I can see "amazon" next to each firm year?


Not easily, as far as I know. Probably would involve PROC SUMMARY and PROC TRANSPOSE.

 

Do you want an output data set, or a printed/viewed report?

--
Paige Miller
anweinbe
Quartz | Level 8

Ideally I plan to have it in an output file that can be brought into SPSS

PaigeMiller
Diamond | Level 26

@anweinbe wrote:

Ideally I plan to have it in an output file that can be brought into SPSS


I don't know what this means, your choices are to save the results in a SAS data set, or some printed/saved output file. Please specify one of the two.

--
Paige Miller
anweinbe
Quartz | Level 8

A SAS dataset will be perfect 🙂 

PaigeMiller
Diamond | Level 26

@anweinbe wrote:

A SAS dataset will be perfect 🙂 


Do the variable names matter? What should they be?

--
Paige Miller
anweinbe
Quartz | Level 8
I would like them to be easily understood. Exact names do not matter in most cases. It would be great for example if the company name did say “company”.
PaigeMiller
Diamond | Level 26

@anweinbe wrote:
I would like them to be easily understood. Exact names do not matter in most cases. It would be great for example if the company name did say “company”.

It looks like you should do this via PROC SUMMARY and PROC TRANSPOSE 

 

proc summary data=have2 nway;
    class company year type;
    var total;
    output out=sums sum=;
run;

proc transpose data=sums out=want;
   by company year;
   var total;
   id type;
run;
--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 830 views
  • 2 likes
  • 4 in conversation