BookmarkSubscribeRSS Feed
sassy_lm
Fluorite | Level 6

Hi everyone,

 

I am wondering how I can transpose the data below which currently has multiple rows per subject (Table 1). I would like to create a new dataset with one row per subject (Table 2). The variables with multiple obs are associated with ID and BID. Thank you in advance! - Liz

 

Table 1: 

IDBIDDx_yeardx_codeshort_description
1250200012345HTN
1250200012344obesity
1250200012333vomiting
2870200212223TB
2870200214322anemia
2870200312355tobacco use
2870200314325infection

 

Table 2:

IDBIDDx_year1Dx_year2Dx_year3Dx_year4dx_code1dx_code2dx_code3dx_code4short_description1short_description2short_description3short_description4
1250200020002000 123451234412333 HTNobesityvomiting 
2870200220022003200312223143221235514325TBanemiatobacco useinfection
5 REPLIES 5
art297
Opal | Level 21

Are BID, Dx_year and dx_code all numeric variables, or are some character variables?

 

Art, CEO, AnalystFinder.com

 

     
sassy_lm
Fluorite | Level 6
BID and Dx_year are numeric variables. However, the variable dx_code is a character variable.

Thanks for asking! Happy to answer any other questions.

- Liz
hashman
Ammonite | Level 13

@sassy_lm:

  1. Read the file to find out what it the largest [id,bid] group on file
  2. Use the result to size up arrays and create the requisite number of array variables for the output
  3. Read the file again by [id,bid] (it's assumed sorted accordingly) and populate the arrays using the record number in each BY group as an index, starting for each BY group at 1

In other (i.e. SAS) words:

data have ;                                                                           
  input ID BID Dx_year dx_code $ short_description & :$12. ;                          
  cards ;                                                                             
1  250  2000  12345  HTN                                                              
1  250  2000  12344  obesity                                                          
1  250  2000  12333  vomiting                                                         
2  870  2002  12223  TB                                                               
2  870  2002  14322  anemia                                                           
2  870  2003  12355  tobacco use                                                      
2  870  2003  14325  infection                                                        
;                                                                                     
run ;                                                                                 
                                                                                      
proc sql noprint ;                                                                    
  select cats (max (g)) into :g from (select count (*) as g from have group id, bid) ;
quit ;                                                                                
                                                                                      
%put &=g ;                                                                            
                                                                                      
data want (drop = dx_year dx_code short_description) ;                                
  do _i_ = 1 by 1 until (last.bid) ;                                                  
    set have ;                                                                        
    by id bid ;                                                                       
    array dxy     dx_year1-dx_year&g ;                                                
    array dxc $ 8 dx_code1-dx_code&g ;                                                
    array sd  $12 short_description1-short_description&g ;                            
    dxy = dx_year ;                                                                   
    dxc = dx_code ;                                                                   
    sd  = short_description ;                                                         
  end ;                                                                               
run ;                                                                                 

Note:

  • The implicit OUTPUT before RUN writes out the array variables populated through the iterations of the DoW-loop as one record.
  • Then program control is passed back to the top of the step and the array variables are auto-populated with missing values since they are not retained. 
  • You can kill the %PUT statement. It's just FYI.

Kind regards

Paul D.

 

art297
Opal | Level 21

My suggestion would be to use the macro a group of us wrote and presented at SAS Global Forum a couple of years ago.

 

One way to do that would be to run the following code:

 

filename ut url 'https://raw.githubusercontent.com/art297/transpose/master/transpose.sas';
%include ut ;

%transpose(data=have, out=want, by=ID BID, Guessingrows=1000,
  var=dx_year dx_code short_description)

If you want to see or download the actual macro and/or the paper, you can find it at: https://github.com/art297/transpose

 

Art, CEO, AnalystFinder.com

 

ballardw
Super User

And how is that data set to be used?

 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3848 views
  • 1 like
  • 4 in conversation