BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
geneshackman
Pyrite | Level 9

I have a data set that looks like this, with data for different years in different columns

county age data2015 data2016 data2017
1 1 89 17 79
1 2 6 76 49
1 3 87 5 25
1 4 45 17 1
1 5 78 59 45
2 1 35 4 20
2 2 51 95 46
2 3 76 24 8
2 4 60 66 30
2 5 54 28 41

 

I would like to change this so each year has it's own set of rows, but keeping their county and age, so it would look like this. Is there an easy way to do this?  I could read in the data set three times, once for each year, and append each time. But I just wanted to ask if there was an easier way.

 

county age year data
1 1 2015 89
1 2 2015 6
1 3 2015 87
1 4 2015 45
1 5 2015 78
2 1 2015 35
2 2 2015 51
2 3 2015 76
2 4 2015 60
2 5 2015 54
1 1 2016 17
1 2 2016 76
1 3 2016 5
1 4 2016 17
1 5 2016 59
2 1 2016 4
2 2 2016 95
2 3 2016 24
2 4 2016 66
2 5 2016 28
1 1 2017 79
1 2 2017 49
1 3 2017 25
1 4 2017 1
1 5 2017 45
2 1 2017 20
2 2 2017 46
2 3 2017 8
2 4 2017 30
2 5 2017 41

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Start with a transpose:

proc transpose data=have out=trans;
by county age;
var data:;
run;

Then transform the result

data want;
set trans;
year = input(substr(_name_,4),4.);
rename col1=data;
drop _name_;
run;

and then you only need to sort according to your needs.

View solution in original post

12 REPLIES 12
Kurt_Bremser
Super User

Start with a transpose:

proc transpose data=have out=trans;
by county age;
var data:;
run;

Then transform the result

data want;
set trans;
year = input(substr(_name_,4),4.);
rename col1=data;
drop _name_;
run;

and then you only need to sort according to your needs.

geneshackman
Pyrite | Level 9

Kurt, just to let you know, your transpose then transform works. Thanks!

novinosrin
Tourmaline | Level 20

Hello @geneshackman 

 

If you are comfortable using Hashes, It's rather straight forward

 


data have;
input county	age	data2015	data2016	data2017;
cards;
1	1	89	17	79
1	2	6	76	49
1	3	87	5	25
1	4	45	17	1
1	5	78	59	45
2	1	35	4	20
2	2	51	95	46
2	3	76	24	8
2	4	60	66	30
2	5	54	28	41
;

data _null_;
if _n_=1 then do;
   dcl hash H (ordered: "A") ;
   h.definekey  ("year","county","age") ;
   h.definedata ("county","age","year", "data") ;
   h.definedone () ;
end;
set have end=lr;
array d data2015-data2017;
do i=1 to dim(d);
data=d(i);
year=compress(vname(d(i)),,'kd');
rc=h.add();
end;
if lr then h.output(dataset:'want');
run;

NOTE: The solution assumes memory is not a constraint

hashman
Ammonite | Level 13

@novinosrin :

Your code is memory-bound only because you make no assumption that the input file is sorted  - which is an advantage if memory is plentiful. Note that the TRANSPOSE solution by @Kurt_Bremser does rely on the sorted order. If your hash code took advantage of it as well, the hash table memory footprint would be dictated only by the size of the largest group of records by COUNTY. For example:

data have ;                                    
  input county age data2015 data2016 data2017 ;
  cards ;                                      
1      1      89      17      79               
1      2      6       76      49               
1      3      87       5      25               
1      4      45      17       1               
1      5      78      59      45               
2      1      35       4      20               
2      2      51      95      46               
2      3      76      24       8               
2      4      60      66      30               
2      5      54      28      41               
;                                              
run ;                                          

data want (keep = county age year data) ;             
  if _n_ = 1 then do ;                                
    dcl hash h (multidata:"Y", ordered:"A") ;         
    h.definekey  ("year") ;                           
    h.definedata ("year", "age", "data") ;            
    h.definedone () ;                                 
    dcl hiter hi ("h") ;                              
  end ;                                               
  do until (last.county) ;                            
    set have ;                                        
    by county ;                                       
    array dd data: ;                                  
    do over dd ;                                      
      year = input (compress (vname (dd),,"kd"), 4.) ;
      data = dd ;                                     
      h.add() ;                                       
    end ;                                             
  end ;                                               
  do while (hi.next() = 0) ;                          
    output ;                                          
  end ;                                               
  h.clear() ;                                         
run ;                                                 

It's instructive to see how the task could be approached in the same vein before the advent of the hash object, i.e. using arrays. Obviously, in this case, the array serving in lieu of the hash table should be either sized using a preliminary pass through the data or just made "large enough" (which is what's done below):

data want (keep = county age year data) ;           
  do j = 1 by 1 until (last.county) ;               
    set have ;                                      
    by county ;                                     
    array dd (i) data: ;                            
    array tt [0:999, 9999] _temporary_ ;            
    do over dd ;                                    
      tt [0,j] = age ;                              
      tt [i,j] = dd ;                               
    end ;                                           
  end ;                                             
  do over dd ;                                      
    year = input (compress (vname (dd),,"kd"), 4.) ;
    do k = 1 to j ;                                 
      age  = tt [0,k] ;                             
      data = tt [i,k] ;                             
      output ;                                      
    end ;                                           
  end ;                                             
run ;                                               

With a more dynamic approach, i.e. with pre-sizing the array bounds using a preliminary pass through HAVE, it could look thusly:

/* size up array TT */                              
data _null_ ;                                       
  do j = 1 by 1 until (last.county) ;               
    set have end = z ;                              
    by county ;                                     
  end ;                                             
  retain jmax ;                                     
  jmax = jmax max j ;                               
  array dd data: ;                                  
  if z ;                                            
  call symputx ("i", dim (dd)) ;                    
  call symputx ("j", jmax    ) ;                    
run ;                                               
/* execute */                                       
data want (keep = county age year data) ;           
  do j = 1 by 1 until (last.county) ;               
    set have ;                                      
    by county ;                                     
    array dd (i) data: ;                            
    array tt [0:&i, &j] _temporary_ ;               
    do over dd ;                                    
      tt [0,j] = age ;                              
      tt [i,j] = dd ;                               
    end ;                                           
  end ;                                             
  do over dd ;                                      
    year = input (compress (vname (dd),,"kd"), 4.) ;
    do k = 1 to j ;                                 
      age  = tt [0,k] ;                             
      data = tt [i,k] ;                             
      output ;                                      
    end ;                                           
  end ;                                             
run ;                                               

Kind regards

Paul D.    

hashman
Ammonite | Level 13

@geneshackman :

I much prefer the offers by @Kurt_Bremser and @novinosrin because they are dynamic, i.e. not hard coded. To better appreciate their beauty, you can compare them with a sledge-hammer hard-coded frontal attack:

data have ;                                     
  input county age data2015 data2016 data2017 ; 
  cards ;                                       
1      1      89      17      79                
1      2      6       76      49                
1      3      87       5      25                
1      4      45      17       1                
1      5      78      59      45                
2      1      35       4      20                
2      2      51      95      46                
2      3      76      24       8                
2      4      60      66      30                
2      5      54      28      41                
;                                               
run ;                                           
                                                
data want (keep = county age year data) ;       
  set have (keep = county age data2015 in = d15)
      have (keep = county age data2016 in = d16)
      have (keep = county age data2017)         
  ;                                             
  if d15 then do ;                              
    year =     2015 ;                           
    data = data2015 ;                           
  end ;                                         
  else if d16 then do ;                         
    year =     2016 ;                           
    data = data2016 ;                           
  end ;                                         
  else do ;                                     
    year =     2017 ;                           
    data = data2017 ;                           
  end ;                                         
run ;                                           

As you can see, if you added another piece of data (related to another year), for example, data2018, you'd need to change the code to accommodate it. Also, it requires as many passes through the data as you have data years. The dynamic approaches are devoid of these drawbacks.

 

Kind regards

Paul D. 

novinosrin
Tourmaline | Level 20

Hello @geneshackman , I may take this opportunity to recommend some books that are priceless to get users to speed in their learning. 

 

1. Book: Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study

 

The best ever.  This link has some preview comments -https://communities.sas.com/t5/Community-Matters/Book-Data-Management-Solutions-Using-SAS-Hash-Table...

 

2. For Other approaches, I prefer Ron Cody(quite wholesome)

 

HTH

 

A sincere request to SAS press @ChrisHemedinger  and @ShelleySessoms . I would like to bring to your attention almost everyone whom I met at Citizens seem to be totally unaware of the variety of books(individual topics level by granulairty) offered by SAS press and its details. As a matter of fact, the kind of learning I had at DePaul and the advancement does not parallel but i get paid so I can't complain. lol  Consequently, they seem to shy away from application of advance techniques

While my fellow colleagues have observed my excitement to get them to notice the wealth of knowledge they can get from the books, I would like to ask whether SAS press makes their customers aware of what is available coz it seems the contrary at least here at my CT office.  Thank you for your attention. 

 

 

 

 

 

ShelleySessoms
Community Manager

Hi @novinosrin,

 

I will pass your message along to the SAS Press folks, and provide your email address should they want to discuss ideas with you for your colleagues at Citizens.

 

Thanks for being such a supporter of SAS Press books.


Best,

Shelley 

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
art297
Opal | Level 21

An even more general solution that requires less code is the %untranspose macro that you can download from: https://github.com/gerhard1050/Untranspose-a-Wide-File

 

Using that macro the only code you'd need would be:

%untranspose(data=have, out=want,by=county age, var=data,id=year)

 

Art, CEO, AnalystFinder.com

 

ballardw
Super User

Perhaps read the data differently to begin with:

data have ;                                     
  input county age @ ; 
  do year = 2015 to 2017;
    input data @;
    output;
  end;
  input;
  cards ;                                       
1      1      89      17      79                
1      2      6       76      49                
1      3      87       5      25                
1      4      45      17       1                
1      5      78      59      45                
2      1      35       4      20                
2      2      51      95      46                
2      3      76      24       8                
2      4      60      66      30                
2      5      54      28      41                
;                                               
run ;   

Proc sort to get the order you want if this doesn't work.

geneshackman
Pyrite | Level 9

Hi Kurt,

Thanks for the suggestion. Question about the first part

 

proc transpose data=have out=trans;
by county age;
var data:;
run;

 

Where does the var "data" come from? I guess it's the first part of the data2015, data2016, etc? But I get "variable data not found". Does it have to be certain number of letters long, or do the "data2015", "data2016" all have to have exactly the same var name formats?

 

Thanks

 

Tom
Super User Tom
Super User

If your error messages says that variable DATA was not found then most likely you did not include the colon in the VAR statement.

var data: ;
geneshackman
Pyrite | Level 9

"If your error messages says that variable DATA was not found then most likely you did not include the colon in the VAR statement."

Correct! Oops, early morning, I wasn't looking closely and thought it was a double ;; that I accidentally added. I added the colon and it worked. Thanks.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1058 views
  • 5 likes
  • 8 in conversation