BookmarkSubscribeRSS Feed
sassy_lm
Fluorite | Level 6

Hi everyone,

 

I am wondering how I can transpose the data below which currently has multiple rows per subject (Table 1). I would like to create a new dataset with one row per subject (Table 2). The variables with multiple obs are associated with ID and BID. Thank you in advance! - Liz

 

Table 1: 

IDBIDDx_yeardx_codeshort_description
1250200012345HTN
1250200012344obesity
1250200012333vomiting
2870200212223TB
2870200214322anemia
2870200312355tobacco use
2870200314325infection

 

Table 2:

IDBIDDx_year1Dx_year2Dx_year3Dx_year4dx_code1dx_code2dx_code3dx_code4short_description1short_description2short_description3short_description4
1250200020002000 123451234412333 HTNobesityvomiting 
2870200220022003200312223143221235514325TBanemiatobacco useinfection
5 REPLIES 5
art297
Opal | Level 21

Are BID, Dx_year and dx_code all numeric variables, or are some character variables?

 

Art, CEO, AnalystFinder.com

 

     
sassy_lm
Fluorite | Level 6
BID and Dx_year are numeric variables. However, the variable dx_code is a character variable.

Thanks for asking! Happy to answer any other questions.

- Liz
hashman
Ammonite | Level 13

@sassy_lm:

  1. Read the file to find out what it the largest [id,bid] group on file
  2. Use the result to size up arrays and create the requisite number of array variables for the output
  3. Read the file again by [id,bid] (it's assumed sorted accordingly) and populate the arrays using the record number in each BY group as an index, starting for each BY group at 1

In other (i.e. SAS) words:

data have ;                                                                           
  input ID BID Dx_year dx_code $ short_description & :$12. ;                          
  cards ;                                                                             
1  250  2000  12345  HTN                                                              
1  250  2000  12344  obesity                                                          
1  250  2000  12333  vomiting                                                         
2  870  2002  12223  TB                                                               
2  870  2002  14322  anemia                                                           
2  870  2003  12355  tobacco use                                                      
2  870  2003  14325  infection                                                        
;                                                                                     
run ;                                                                                 
                                                                                      
proc sql noprint ;                                                                    
  select cats (max (g)) into :g from (select count (*) as g from have group id, bid) ;
quit ;                                                                                
                                                                                      
%put &=g ;                                                                            
                                                                                      
data want (drop = dx_year dx_code short_description) ;                                
  do _i_ = 1 by 1 until (last.bid) ;                                                  
    set have ;                                                                        
    by id bid ;                                                                       
    array dxy     dx_year1-dx_year&g ;                                                
    array dxc $ 8 dx_code1-dx_code&g ;                                                
    array sd  $12 short_description1-short_description&g ;                            
    dxy = dx_year ;                                                                   
    dxc = dx_code ;                                                                   
    sd  = short_description ;                                                         
  end ;                                                                               
run ;                                                                                 

Note:

  • The implicit OUTPUT before RUN writes out the array variables populated through the iterations of the DoW-loop as one record.
  • Then program control is passed back to the top of the step and the array variables are auto-populated with missing values since they are not retained. 
  • You can kill the %PUT statement. It's just FYI.

Kind regards

Paul D.

 

art297
Opal | Level 21

My suggestion would be to use the macro a group of us wrote and presented at SAS Global Forum a couple of years ago.

 

One way to do that would be to run the following code:

 

filename ut url 'https://raw.githubusercontent.com/art297/transpose/master/transpose.sas';
%include ut ;

%transpose(data=have, out=want, by=ID BID, Guessingrows=1000,
  var=dx_year dx_code short_description)

If you want to see or download the actual macro and/or the paper, you can find it at: https://github.com/art297/transpose

 

Art, CEO, AnalystFinder.com

 

ballardw
Super User

And how is that data set to be used?

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3009 views
  • 1 like
  • 4 in conversation