BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michokwu
Quartz | Level 8

Hello,

I would like to link and aggregate records (over 10 million) in a dataset to another dataset. Is there a way to achieve this in one task? Thank you

 

Data 1 - what I have

CityCity_CodeItem 1Item 2Item 3Item 4
City A12300100110203040
City B12300100220304050
City C12300200130405060
City D12300200240506070
City E12300300150607080
City F12300300260708090
City G123004001708090100
City H1230040028090100110

 

Data 2 - what I want (the region code forms the beginning part of the city code)

RegionRegion_CodeItem 1Item 2Item 3Item 4
North12300130507090
South1230027090110130
East123003110130150170
West123004150170190210
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
You can do it quite easily in two steps.

This assumes that Region_Code is character.

proc means data=have SUM stackODS;
class city_code;
format city_code 6.; *uses only 6 characters for grouping, very important here;
var item1-item4;
ods output summary = want1;
output out=want2 sum= / autoname;
run;

Now apply the Region name to the data.

Or you can do it via one SQL pass but it's more complicated IMO.

proc sql;
create table want3 as
select substr(region_code, 1, 6) as region_code, put(region_code, $region_fmt.) as region, sum(item1) as item1, sum(item2) as item2, sum(item3) as item3, sum(item4) as item4
from have
group by calculated region_code;
quit;

View solution in original post

7 REPLIES 7
Reeza
Super User
You can do it quite easily in two steps.

This assumes that Region_Code is character.

proc means data=have SUM stackODS;
class city_code;
format city_code 6.; *uses only 6 characters for grouping, very important here;
var item1-item4;
ods output summary = want1;
output out=want2 sum= / autoname;
run;

Now apply the Region name to the data.

Or you can do it via one SQL pass but it's more complicated IMO.

proc sql;
create table want3 as
select substr(region_code, 1, 6) as region_code, put(region_code, $region_fmt.) as region, sum(item1) as item1, sum(item2) as item2, sum(item3) as item3, sum(item4) as item4
from have
group by calculated region_code;
quit;
michokwu
Quartz | Level 8

Thank you Reeza.

Please can you clarify 'Now apply the Region name to the data'

 

 

Reeza
Super User
If you run the PROC MEANS it will combine up the region_codes, ie 123001 but it doesn't know that that region_code means East. I suggest adding another format for that but you'll still need to apply it in a second step with the PROC MEANS approach but its pretty simple IMO.
michokwu
Quartz | Level 8

can't seem to format the city code, even though the format contains fewer characters, when run the program it retains the region code. The codes are numeric

michokwu
Quartz | Level 8

Thanks Reeza,

 

I also had to transpose the output into the format I preferred.

 

Reeza
Super User
You may not want the STACKODS option on PROC MEANS then, that changes how the output is structured.
hashman
Ammonite | Level 13

@michokwu:

 

SQL is perhaps the best (or at least simplest) option:

data city ;                                                                                                                                                                                                                                                     
  input @1 City $6. City_Code:$9. Item1-item4 ;                                                                                                                                                                                                                 
  cards ;                                                                                                                                                                                                                                                       
City A  123001001  10  20   30   40                                                                                                                                                                                                                             
City B  123001002  20  30   40   50                                                                                                                                                                                                                             
City C  123002001  30  40   50   60                                                                                                                                                                                                                             
City D  123002002  40  50   60   70                                                                                                                                                                                                                             
City E  123003001  50  60   70   80                                                                                                                                                                                                                             
City F  123003002  60  70   80   90                                                                                                                                                                                                                             
City G  123004001  70  80   90  100                                                                                                                                                                                                                             
City H  123004002  80  90  100  110                                                                                                                                                                                                                             
;                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                
data region ;                                                                                                                                                                                                                                                   
  input Region $ Region_Code:$6. ;                                                                                                                                                                                                                              
  cards ;                                                                                                                                                                                                                                                       
North  123001                                                                                                                                                                                                                                                   
South  123002                                                                                                                                                                                                                                                   
East   123003                                                                                                                                                                                                                                                   
West   123004                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
proc sql ;                                                                                                                                                                                                                                                      
  create table want as                                                                                                                                                                                                                                          
  select region                                                                                                                                                                                                                                                 
       , region_code                                                                                                                                                                                                                                            
       , sum (item1) as item1                                                                                                                                                                                                                                   
       , sum (item2) as item2                                                                                                                                                                                                                                   
       , sum (item3) as item3                                                                                                                                                                                                                                   
       , sum (item4) as item4                                                                                                                                                                                                                                   
  from   city, region                                                                                                                                                                                                                                           
  where  region_code = put (city_code, $6.)                                                                                                                                                                                                                     
  group  1, 2                                                                                                                                                                                                                                                   
  order  2                                                                                                                                                                                                                                                      
  ;                                                                                                                                                                                                                                                             
quit ;                   

If you loathe to list all the items as above, especially if there are way many than 4, heed what @Reeza has said. It can be done in a single DATA step, too (i.e. without listing all the items), using the hash object; but if you see the requisite code, you'll sure appreciate the simplicity of SQL:

data _null_ ;                                                                                                                                                                                                                                                   
  if _n_ = 1 then do ;                                                                                                                                                                                                                                          
    if 0 then set region city ;                                                                                                                                                                                                                                 
    array it item: ;                                                                                                                                                                                                                                            
    dcl hash h (ordered:"a") ;                                                                                                                                                                                                                                  
    h.definekey ("region_code") ;                                                                                                                                                                                                                               
    h.definedata ("region", "region_code") ;                                                                                                                                                                                                                    
    do over it ;                                                                                                                                                                                                                                                
      h.definedata (vname (it)) ;                                                                                                                                                                                                                               
    end ;                                                                                                                                                                                                                                                       
    h.definedone() ;                                                                                                                                                                                                                                            
    do until (lr) ;                                                                                                                                                                                                                                             
      set region end = lr ;                                                                                                                                                                                                                                     
      h.add() ;                                                                                                                                                                                                                                                 
    end ;                                                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
  set city (rename=(item1-item4=_it1-_it4)) end = lc ;                                                                                                                                                                                                          
  array _it _it: ;                                                                                                                                                                                                                                              
  region_code = put (city_code, $6.) ;                                                                                                                                                                                                                          
  if h.find() ne 0 then call missing (of item:) ;                                                                                                                                                                                                               
  do over it ;                                                                                                                                                                                                                                                  
    it + _it ;                                                                                                                                                                                                                                                  
  end ;                                                                                                                                                                                                                                                         
  h.replace() ;                                                                                                                                                                                                                                                 
  if lc then h.output (dataset:"want") ;                                                                                                                                                                                                                        
run ;                  

OTOH, instead of listing all the sum(item1) as item1, ... in the SQL query, you can auto-construct a macro variable containing the necessary text:

data _null_ ;                                                                                                                                                                                                                                                   
  length sumit $ 32767 ;                                                                                                                                                                                                                                        
  do i = 1 to 4 ;                                                                                                                                                                                                                                               
    sumit = catx (",", sumit, cats ("sum(item", i, ") as item", i)) ;                                                                                                                                                                                           
  end ;                                                                                                                                                                                                                                                         
  call symputx ("sumit", sumit) ;                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
proc sql ;                                                                                                                                                                                                                                                      
  create table want as                                                                                                                                                                                                                                          
  select region                                                                                                                                                                                                                                                 
       , region_code                                                                                                                                                                                                                                            
       , &sumit                                                                                                                                                                                                                                                 
  from   city, region                                                                                                                                                                                                                                           
  where  region_code = put (city_code, $6.)                                                                                                                                                                                                                     
  group  1, 2                                                                                                                                                                                                                                                   
  order  2                                                                                                                                                                                                                                                      
  ;                                                                                                                                                                                                                                                             
quit ;             

Kind regards

Paul D.  

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1080 views
  • 0 likes
  • 3 in conversation