BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michokwu
Quartz | Level 8

Hello,

I would like to link and aggregate records (over 10 million) in a dataset to another dataset. Is there a way to achieve this in one task? Thank you

 

Data 1 - what I have

CityCity_CodeItem 1Item 2Item 3Item 4
City A12300100110203040
City B12300100220304050
City C12300200130405060
City D12300200240506070
City E12300300150607080
City F12300300260708090
City G123004001708090100
City H1230040028090100110

 

Data 2 - what I want (the region code forms the beginning part of the city code)

RegionRegion_CodeItem 1Item 2Item 3Item 4
North12300130507090
South1230027090110130
East123003110130150170
West123004150170190210
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
You can do it quite easily in two steps.

This assumes that Region_Code is character.

proc means data=have SUM stackODS;
class city_code;
format city_code 6.; *uses only 6 characters for grouping, very important here;
var item1-item4;
ods output summary = want1;
output out=want2 sum= / autoname;
run;

Now apply the Region name to the data.

Or you can do it via one SQL pass but it's more complicated IMO.

proc sql;
create table want3 as
select substr(region_code, 1, 6) as region_code, put(region_code, $region_fmt.) as region, sum(item1) as item1, sum(item2) as item2, sum(item3) as item3, sum(item4) as item4
from have
group by calculated region_code;
quit;

View solution in original post

7 REPLIES 7
Reeza
Super User
You can do it quite easily in two steps.

This assumes that Region_Code is character.

proc means data=have SUM stackODS;
class city_code;
format city_code 6.; *uses only 6 characters for grouping, very important here;
var item1-item4;
ods output summary = want1;
output out=want2 sum= / autoname;
run;

Now apply the Region name to the data.

Or you can do it via one SQL pass but it's more complicated IMO.

proc sql;
create table want3 as
select substr(region_code, 1, 6) as region_code, put(region_code, $region_fmt.) as region, sum(item1) as item1, sum(item2) as item2, sum(item3) as item3, sum(item4) as item4
from have
group by calculated region_code;
quit;
michokwu
Quartz | Level 8

Thank you Reeza.

Please can you clarify 'Now apply the Region name to the data'

 

 

Reeza
Super User
If you run the PROC MEANS it will combine up the region_codes, ie 123001 but it doesn't know that that region_code means East. I suggest adding another format for that but you'll still need to apply it in a second step with the PROC MEANS approach but its pretty simple IMO.
michokwu
Quartz | Level 8

can't seem to format the city code, even though the format contains fewer characters, when run the program it retains the region code. The codes are numeric

michokwu
Quartz | Level 8

Thanks Reeza,

 

I also had to transpose the output into the format I preferred.

 

Reeza
Super User
You may not want the STACKODS option on PROC MEANS then, that changes how the output is structured.
hashman
Ammonite | Level 13

@michokwu:

 

SQL is perhaps the best (or at least simplest) option:

data city ;                                                                                                                                                                                                                                                     
  input @1 City $6. City_Code:$9. Item1-item4 ;                                                                                                                                                                                                                 
  cards ;                                                                                                                                                                                                                                                       
City A  123001001  10  20   30   40                                                                                                                                                                                                                             
City B  123001002  20  30   40   50                                                                                                                                                                                                                             
City C  123002001  30  40   50   60                                                                                                                                                                                                                             
City D  123002002  40  50   60   70                                                                                                                                                                                                                             
City E  123003001  50  60   70   80                                                                                                                                                                                                                             
City F  123003002  60  70   80   90                                                                                                                                                                                                                             
City G  123004001  70  80   90  100                                                                                                                                                                                                                             
City H  123004002  80  90  100  110                                                                                                                                                                                                                             
;                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                
data region ;                                                                                                                                                                                                                                                   
  input Region $ Region_Code:$6. ;                                                                                                                                                                                                                              
  cards ;                                                                                                                                                                                                                                                       
North  123001                                                                                                                                                                                                                                                   
South  123002                                                                                                                                                                                                                                                   
East   123003                                                                                                                                                                                                                                                   
West   123004                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
proc sql ;                                                                                                                                                                                                                                                      
  create table want as                                                                                                                                                                                                                                          
  select region                                                                                                                                                                                                                                                 
       , region_code                                                                                                                                                                                                                                            
       , sum (item1) as item1                                                                                                                                                                                                                                   
       , sum (item2) as item2                                                                                                                                                                                                                                   
       , sum (item3) as item3                                                                                                                                                                                                                                   
       , sum (item4) as item4                                                                                                                                                                                                                                   
  from   city, region                                                                                                                                                                                                                                           
  where  region_code = put (city_code, $6.)                                                                                                                                                                                                                     
  group  1, 2                                                                                                                                                                                                                                                   
  order  2                                                                                                                                                                                                                                                      
  ;                                                                                                                                                                                                                                                             
quit ;                   

If you loathe to list all the items as above, especially if there are way many than 4, heed what @Reeza has said. It can be done in a single DATA step, too (i.e. without listing all the items), using the hash object; but if you see the requisite code, you'll sure appreciate the simplicity of SQL:

data _null_ ;                                                                                                                                                                                                                                                   
  if _n_ = 1 then do ;                                                                                                                                                                                                                                          
    if 0 then set region city ;                                                                                                                                                                                                                                 
    array it item: ;                                                                                                                                                                                                                                            
    dcl hash h (ordered:"a") ;                                                                                                                                                                                                                                  
    h.definekey ("region_code") ;                                                                                                                                                                                                                               
    h.definedata ("region", "region_code") ;                                                                                                                                                                                                                    
    do over it ;                                                                                                                                                                                                                                                
      h.definedata (vname (it)) ;                                                                                                                                                                                                                               
    end ;                                                                                                                                                                                                                                                       
    h.definedone() ;                                                                                                                                                                                                                                            
    do until (lr) ;                                                                                                                                                                                                                                             
      set region end = lr ;                                                                                                                                                                                                                                     
      h.add() ;                                                                                                                                                                                                                                                 
    end ;                                                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
  set city (rename=(item1-item4=_it1-_it4)) end = lc ;                                                                                                                                                                                                          
  array _it _it: ;                                                                                                                                                                                                                                              
  region_code = put (city_code, $6.) ;                                                                                                                                                                                                                          
  if h.find() ne 0 then call missing (of item:) ;                                                                                                                                                                                                               
  do over it ;                                                                                                                                                                                                                                                  
    it + _it ;                                                                                                                                                                                                                                                  
  end ;                                                                                                                                                                                                                                                         
  h.replace() ;                                                                                                                                                                                                                                                 
  if lc then h.output (dataset:"want") ;                                                                                                                                                                                                                        
run ;                  

OTOH, instead of listing all the sum(item1) as item1, ... in the SQL query, you can auto-construct a macro variable containing the necessary text:

data _null_ ;                                                                                                                                                                                                                                                   
  length sumit $ 32767 ;                                                                                                                                                                                                                                        
  do i = 1 to 4 ;                                                                                                                                                                                                                                               
    sumit = catx (",", sumit, cats ("sum(item", i, ") as item", i)) ;                                                                                                                                                                                           
  end ;                                                                                                                                                                                                                                                         
  call symputx ("sumit", sumit) ;                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
proc sql ;                                                                                                                                                                                                                                                      
  create table want as                                                                                                                                                                                                                                          
  select region                                                                                                                                                                                                                                                 
       , region_code                                                                                                                                                                                                                                            
       , &sumit                                                                                                                                                                                                                                                 
  from   city, region                                                                                                                                                                                                                                           
  where  region_code = put (city_code, $6.)                                                                                                                                                                                                                     
  group  1, 2                                                                                                                                                                                                                                                   
  order  2                                                                                                                                                                                                                                                      
  ;                                                                                                                                                                                                                                                             
quit ;             

Kind regards

Paul D.  

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1597 views
  • 0 likes
  • 3 in conversation