Data Transformation - long to wide - multiple ID variables

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Data Transformation - long to wide - multiple ID variables

Good Afternoon,

 

I'm looking to transform a large data set from long to wide by ID.  I've pasted an example of what the data looks like currently.  I'd like to have a column for ID, RX (response from Result_char), RX_12MO (response from result_num), RX_TIME (response from result_num), RX_COND (response from result_char).  PROC TRANSPOSE does not allow me to keep multiple response.  Would I need several DO loops linking via Loop_index?

IDVariableResult_charResult_numLoop_item_IDLoop_indexEvent
333rx_12mo_aaNULL1MED01037
333rx_12mo_abNULL1MED01137
333rx_12mo_acNULL1MED01237
333rx_12mo_adNULL1MED01337
333rx_12mo_aeNULL2MED01437
333rx_time_aaNULL3MED01037
333rx_time_abNULL3MED01137
333rx_time_acNULL3MED01237
333rx_time_adNULL3MED01337
333rx_time_aeNULL1MED01437
333rxdrugXNULLMED01037
333rxdrugyNULLMED01137
333rxdrugXNULLMED01237
333rxdrugzNULLMED01337
333rxdrugXNULLMED01437
333rx_cond_aaasthmaNULLMED01037
333rx_cond_abasthmaNULLMED01137
333rx_cond_acasthmaNULLMED01237
333rx_cond_adallergiesNULLMED01337
333rx_cond_aeasthmaNULLMED01437

 

I'd like the results to be presented like this:

IDRXRX_12MORX_TIMERX_CONDEVENT
333drugX13asthma37
333drugY13asthma37

 

Any help would be very much appreciated!!

 

Thank you!


Accepted Solutions
Solution
‎03-06-2017 03:33 PM
Grand Advisor
Posts: 10,193

Re: Data Transformation - long to wide - multiple ID variables

This may get you started. I wouldn't want to go through too many additional variables though. I would seriously discuss this data layout with the source as the only thing I've seen this problematic before is the old Lotus 123 PRN file layout.

 

proc sql;
   create table work.want as
   select  a.id,c.rx,a.Rx_12_mo,b.Rx_time,d.rx_cond
   ,a.event 
   from 
      (select id, result_num as Rx_12_mo,event,Loop_index from work.example where index(variable,"rx_12mo")=1) as a
      left join
      (select id, result_num as Rx_time,Loop_index from work.example where index(variable,"rx_time")=1) as b
      on a.id=b.id and a.Loop_index=b.Loop_index
      left join 
      (select id, Result_char as Rx,Loop_index from work.example where variable='rx') as c
      on a.id=c.id and a.Loop_index=c.loop_index
      left join
      (select id, Result_char as Rx_Cond,Loop_index from work.example where index(variable,"rx_cond")=1) as d
      on a.id=d.id and a.Loop_index=d.Loop_index
   ;
quit;

View solution in original post


All Replies
Grand Advisor
Posts: 10,193

Re: Data Transformation - long to wide - multiple ID variables

You have RX_12MO_AA to RX_12MO_AE and similar for RX_time. How do we know which is associated with drugx or drugy? How do we get 1 for the value of the RX_12MO since there are 1 and 2 in the "result_num" values, and similar for 3 for RX_time since it appears we have a choice between 1,2,3; Rx_cond has choices as shown between asthma and allergies so how do we know that the result should be ashtma?

Occasional Contributor
Posts: 7

Re: Data Transformation - long to wide - multiple ID variables

Hi ballardW,

My apologies I left out a column.  The responses for RX:  are linked through the loop_index.

 

IDRXRX_12MORX_TIMERX_CONDEVENTLoop_index
333drugX13asthma370
333drugy13asthma371
Grand Advisor
Posts: 10,193

Re: Data Transformation - long to wide - multiple ID variables

Can you provide an example starting data set? Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

I get concerned about conversions as SAS normally  doesn't use NULL and I'm not sure what type any of your variables start as even though they look numeric and whether the results are numeric or character.

 

Also, since one solution I have in mind could involve arrays what is the maximum number of "loop index" values for all of the IDs involved.

Occasional Contributor
Posts: 7

Re: Data Transformation - long to wide - multiple ID variables

data WORK.EXAMPLE;
  infile datalines dsd truncover;
  input ID:BEST12. Variable:$10. Result_char:$9. Result_num:$4. Loop_item_ID:$5. Loop_index:BEST12. Event:BEST12.;
datalines4;
333,rx_12mo_aa,NULL,1,MED01,0,37
333,rx_12mo_ab,NULL,1,MED01,1,37
333,rx_12mo_ac,NULL,1,MED01,2,37
333,rx_12mo_ad,NULL,1,MED01,3,37
333,rx_12mo_ae,NULL,2,MED01,4,37
333,rx_time_aa,NULL,3,MED01,0,37
333,rx_time_ab,NULL,3,MED01,1,37
333,rx_time_ac,NULL,3,MED01,2,37
333,rx_time_ad,NULL,3,MED01,3,37
333,rx_time_ae,NULL,1,MED01,4,37
333,rx,drugX,NULL,MED01,0,37
333,rx,drugy,NULL,MED01,1,37
333,rx,drugX,NULL,MED01,2,37
333,rx,drugz,NULL,MED01,3,37
333,rx,drugX,NULL,MED01,4,37
333,rx_cond_aa,asthma,NULL,MED01,0,37
333,rx_cond_ab,asthma,NULL,MED01,1,37
333,rx_cond_ac,asthma,NULL,MED01,2,37
333,rx_cond_ad,allergies,NULL,MED01,3,37
333,rx_cond_ae,asthma,NULL,MED01,4,37
;;;;

The maximum number of Loop_index is six (values 0 to 5).  The ID and Event are numeric variables - everything else is a character variable.  

Attachment
Solution
‎03-06-2017 03:33 PM
Grand Advisor
Posts: 10,193

Re: Data Transformation - long to wide - multiple ID variables

This may get you started. I wouldn't want to go through too many additional variables though. I would seriously discuss this data layout with the source as the only thing I've seen this problematic before is the old Lotus 123 PRN file layout.

 

proc sql;
   create table work.want as
   select  a.id,c.rx,a.Rx_12_mo,b.Rx_time,d.rx_cond
   ,a.event 
   from 
      (select id, result_num as Rx_12_mo,event,Loop_index from work.example where index(variable,"rx_12mo")=1) as a
      left join
      (select id, result_num as Rx_time,Loop_index from work.example where index(variable,"rx_time")=1) as b
      on a.id=b.id and a.Loop_index=b.Loop_index
      left join 
      (select id, Result_char as Rx,Loop_index from work.example where variable='rx') as c
      on a.id=c.id and a.Loop_index=c.loop_index
      left join
      (select id, Result_char as Rx_Cond,Loop_index from work.example where index(variable,"rx_cond")=1) as d
      on a.id=d.id and a.Loop_index=d.Loop_index
   ;
quit;
Occasional Contributor
Posts: 7

Re: Data Transformation - long to wide - multiple ID variables

Thank you ballardw!  This worked - only issue I had was that my dataset is ~3000 observations and this program created some duplicate observations in the new table.  So I used this code to remove duplicate values:

PROC SQL;
CREATE TABLE work.new_table as
SELECT
DISTINCT *
FROM work.old_table;
QUIT;
Grand Advisor
Posts: 10,193

Re: Data Transformation - long to wide - multiple ID variables


glcoolj12 wrote:

Thank you ballardw!  This worked - only issue I had was that my dataset is ~3000 observations and this program created some duplicate observations in the new table.  So I used this code to remove duplicate values:


 

I would go back to the data to confirm that the layout is as you expected. It may be that for some ID values you have MULTIPLE sets of responses in terms of "EVENT". Since you didn't show any multiple event data I did not consider that in a solution. It is very likely that all of the JOIN ON conditions need to have the event values match, which would mean adding them in the select clauses as well as the condition.

Occasional Contributor
Posts: 7

Re: Data Transformation - long to wide - multiple ID variables

Yes, that worked, thank you! I appreciate all of your help.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 190 views
  • 2 likes
  • 2 in conversation