BookmarkSubscribeRSS Feed
ellenqi
Calcite | Level 5

hi everyone,

here is my data

id   var1 var2    newvar

1     23    1      

2     12     1 

3     14     2   

4     33     1   

5    18    3

6     20   2

7    45    3  

...

...

Var1 is a continuous variable, and Var2 is a categorical variable (3 categories). For 1st category (var2=1), the newvar= mean of var 1 among the people in the 1st category. For the 2nd category (var2=2), the newvar=mean of var2 among the people in the 2nd category. For the 3rd category, the I just want to newvar=var1; Therefore the newvar will be assigned with the same value for 1st category, and with another same value for 2nd category.

I know I can do this by running proc means to calculate the means of var1 for two different categorical var2. But I'd like to do this directly, without running proc means beforehand, and without rounding the means and assigning it by myself.

 

Thank you! 

 

3 REPLIES 3
Reeza
Super User

Are you familiar with DoW loops? That's primarily the way to handle this without PROC MEANs, but it's definitely an advanced topic. 

Why is there the limitation on PROC MEANS?

What about SQL?

 


@ellenqi wrote:

hi everyone,

here is my data

id   var1 var2    newvar

1     23    1      

2     12     1 

3     14     2   

4     33     1   

5    18    3

6     20   2

7    45    3  

...

...

Var1 is a continuous variable, and Var2 is a categorical variable (3 categories). For 1st category (var2=1), the newvar= mean of var 1 among the people in the 1st category. For the 2nd category (var2=2), the newvar=mean of var2 among the people in the 2nd category. For the 3rd category, the I just want to newvar=var1; Therefore the newvar will be assigned with the same value for 1st category, and with another same value for 2nd category.

I know I can do this by running proc means to calculate the means of var1 for two different categorical var2. But I'd like to do this directly, without running proc means beforehand, and without rounding the means and assigning it by myself.

 

Thank you! 

 


 

Kurt_Bremser
Super User

In the second category, the mean of var2 can only be 2 by definition (after conversion to numeric). As I don't think this is what you want, you should review your question.


@ellenqi wrote:

hi everyone,

here is my data

id   var1 var2    newvar

1     23    1      

2     12     1 

3     14     2   

4     33     1   

5    18    3

6     20   2

7    45    3  

...

...

Var1 is a continuous variable, and Var2 is a categorical variable (3 categories). For 1st category (var2=1), the newvar= mean of var 1 among the people in the 1st category. For the 2nd category (var2=2), the newvar=mean of var2 among the people in the 2nd category. For the 3rd category, the I just want to newvar=var1; Therefore the newvar will be assigned with the same value for 1st category, and with another same value for 2nd category.

I know I can do this by running proc means to calculate the means of var1 for two different categorical var2. But I'd like to do this directly, without running proc means beforehand, and without rounding the means and assigning it by myself.

 

Thank you! 

 


 

hashman
Ammonite | Level 13

@Reeza is right that this is a typical job for the DoW loop, however the latter requires explicit sorting by VAR2. If you don't mind sorting, then:

data have ;                           
  input id var1 var2 ;                
  cards ;                             
1  23  1                              
2  12  1                              
3  14  2                              
4  33  1                              
5  18  3                              
6  20  2                              
7  45  3                              
run ;                                 
                                      
proc sort data = have out = havesort ;
  by var2 id ;                        
run ;                                 
                                      
data want_dow (drop = _:) ;           
  do _n_ = 1 by 1 until (last.var2) ; 
    set havesort (keep = var:) ;      
    by var2 ;                         
    _s = sum (_s, var1) ;             
  end ;                               
  newvar = divide (_s, _n_) ;         
  do _n_ = 1 to _n_ ;                 
    set havesort ;                    
    output ;                          
  end ;                               
run ;                                 

If you want to preserve the original data order without sorting forth and back, you can first aggregate the sums and counts using the hash object, then look it up for each record from  HAVE and compute the means:

data want_hash (drop = _:) ;                       
  if _n_ = 1 then do ;                             
    dcl hash h () ;                                
    h.definekey ("var2") ;                         
    h.definedata ("_s", "_q") ;                    
    h.definedone () ;                              
    do until (z) ;                                 
      set have (keep = var:) end = z ;             
      if h.find() ne 0 then call missing (_s, _q) ;
      _s = sum (_s, var1) ;                        
      _q = sum (_q, 1) ;                           
      h.replace() ;                                
    end ;                                          
  end ;                                            
  set have ;                                       
  h.find() ;                                       
  newvar = divide (_s, _q) ;                       
run ;                                              

However, perhaps the simplest way (also suggested by @Reeza) is just to use the self-merging property of SQL:

proc sql ;                                   
  create table want_sql as                   
  select id, var2, var1, avg (var1) as newvar
  from   have                                
  group  var2                                
  order  var2, id                            
  ;                                          
quit ;                                       

Kind regards

Paul D.    

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 466 views
  • 0 likes
  • 4 in conversation