Solved: Re: Data Transformation - long to wide - multiple ID variables

glcoolj12 · Posted 03-03-2017 12:05 PM

Good Afternoon,

I'm looking to transform a large data set from long to wide by ID. I've pasted an example of what the data looks like currently. I'd like to have a column for ID, RX (response from Result_char), RX_12MO (response from result_num), RX_TIME (response from result_num), RX_COND (response from result_char). PROC TRANSPOSE does not allow me to keep multiple response. Would I need several DO loops linking via Loop_index?

ID	Variable	Result_char	Result_num	Loop_item_ID	Loop_index	Event
333	rx_12mo_aa	NULL	1	MED01	0	37
333	rx_12mo_ab	NULL	1	MED01	1	37
333	rx_12mo_ac	NULL	1	MED01	2	37
333	rx_12mo_ad	NULL	1	MED01	3	37
333	rx_12mo_ae	NULL	2	MED01	4	37
333	rx_time_aa	NULL	3	MED01	0	37
333	rx_time_ab	NULL	3	MED01	1	37
333	rx_time_ac	NULL	3	MED01	2	37
333	rx_time_ad	NULL	3	MED01	3	37
333	rx_time_ae	NULL	1	MED01	4	37
333	rx	drugX	NULL	MED01	0	37
333	rx	drugy	NULL	MED01	1	37
333	rx	drugX	NULL	MED01	2	37
333	rx	drugz	NULL	MED01	3	37
333	rx	drugX	NULL	MED01	4	37
333	rx_cond_aa	asthma	NULL	MED01	0	37
333	rx_cond_ab	asthma	NULL	MED01	1	37
333	rx_cond_ac	asthma	NULL	MED01	2	37
333	rx_cond_ad	allergies	NULL	MED01	3	37
333	rx_cond_ae	asthma	NULL	MED01	4	37

I'd like the results to be presented like this:

ID	RX	RX_12MO	RX_TIME	RX_COND	EVENT
333	drugX	1	3	asthma	37
333	drugY	1	3	asthma	37

Any help would be very much appreciated!!

Thank you!

ballardw · Posted 03-03-2017 04:37 PM

This may get you started. I wouldn't want to go through too many additional variables though. I would seriously discuss this data layout with the source as the only thing I've seen this problematic before is the old Lotus 123 PRN file layout.

proc sql;
   create table work.want as
   select  a.id,c.rx,a.Rx_12_mo,b.Rx_time,d.rx_cond
   ,a.event 
   from 
      (select id, result_num as Rx_12_mo,event,Loop_index from work.example where index(variable,"rx_12mo")=1) as a
      left join
      (select id, result_num as Rx_time,Loop_index from work.example where index(variable,"rx_time")=1) as b
      on a.id=b.id and a.Loop_index=b.Loop_index
      left join 
      (select id, Result_char as Rx,Loop_index from work.example where variable='rx') as c
      on a.id=c.id and a.Loop_index=c.loop_index
      left join
      (select id, Result_char as Rx_Cond,Loop_index from work.example where index(variable,"rx_cond")=1) as d
      on a.id=d.id and a.Loop_index=d.Loop_index
   ;
quit;

View solution in original post

ballardw · Posted 03-03-2017 12:43 PM

You have RX_12MO_AA to RX_12MO_AE and similar for RX_time. How do we know which is associated with drugx or drugy? How do we get 1 for the value of the RX_12MO since there are 1 and 2 in the "result_num" values, and similar for 3 for RX_time since it appears we have a choice between 1,2,3; Rx_cond has choices as shown between asthma and allergies so how do we know that the result should be ashtma?

glcoolj12 · Posted 03-03-2017 12:50 PM

Hi ballardW,

My apologies I left out a column. The responses for RX: are linked through the loop_index.

ID	RX	RX_12MO	RX_TIME	RX_COND	EVENT	Loop_index
333	drugX	1	3	asthma	37	0
333	drugy	1	3	asthma	37	1

ballardw · Posted 03-03-2017 01:57 PM

Can you provide an example starting data set? Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

I get concerned about conversions as SAS normally doesn't use NULL and I'm not sure what type any of your variables start as even though they look numeric and whether the results are numeric or character.

Also, since one solution I have in mind could involve arrays what is the maximum number of "loop index" values for all of the IDs involved.

glcoolj12 · Posted 03-03-2017 03:19 PM

data WORK.EXAMPLE;
  infile datalines dsd truncover;
  input ID:BEST12. Variable:$10. Result_char:$9. Result_num:$4. Loop_item_ID:$5. Loop_index:BEST12. Event:BEST12.;
datalines4;
333,rx_12mo_aa,NULL,1,MED01,0,37
333,rx_12mo_ab,NULL,1,MED01,1,37
333,rx_12mo_ac,NULL,1,MED01,2,37
333,rx_12mo_ad,NULL,1,MED01,3,37
333,rx_12mo_ae,NULL,2,MED01,4,37
333,rx_time_aa,NULL,3,MED01,0,37
333,rx_time_ab,NULL,3,MED01,1,37
333,rx_time_ac,NULL,3,MED01,2,37
333,rx_time_ad,NULL,3,MED01,3,37
333,rx_time_ae,NULL,1,MED01,4,37
333,rx,drugX,NULL,MED01,0,37
333,rx,drugy,NULL,MED01,1,37
333,rx,drugX,NULL,MED01,2,37
333,rx,drugz,NULL,MED01,3,37
333,rx,drugX,NULL,MED01,4,37
333,rx_cond_aa,asthma,NULL,MED01,0,37
333,rx_cond_ab,asthma,NULL,MED01,1,37
333,rx_cond_ac,asthma,NULL,MED01,2,37
333,rx_cond_ad,allergies,NULL,MED01,3,37
333,rx_cond_ae,asthma,NULL,MED01,4,37
;;;;

The maximum number of Loop_index is six (values 0 to 5). The ID and Event are numeric variables - everything else is a character variable.

ballardw · Posted 03-03-2017 04:37 PM

This may get you started. I wouldn't want to go through too many additional variables though. I would seriously discuss this data layout with the source as the only thing I've seen this problematic before is the old Lotus 123 PRN file layout.

proc sql;
   create table work.want as
   select  a.id,c.rx,a.Rx_12_mo,b.Rx_time,d.rx_cond
   ,a.event 
   from 
      (select id, result_num as Rx_12_mo,event,Loop_index from work.example where index(variable,"rx_12mo")=1) as a
      left join
      (select id, result_num as Rx_time,Loop_index from work.example where index(variable,"rx_time")=1) as b
      on a.id=b.id and a.Loop_index=b.Loop_index
      left join 
      (select id, Result_char as Rx,Loop_index from work.example where variable='rx') as c
      on a.id=c.id and a.Loop_index=c.loop_index
      left join
      (select id, Result_char as Rx_Cond,Loop_index from work.example where index(variable,"rx_cond")=1) as d
      on a.id=d.id and a.Loop_index=d.Loop_index
   ;
quit;

glcoolj12 · Posted 03-06-2017 03:15 PM

Thank you ballardw! This worked - only issue I had was that my dataset is ~3000 observations and this program created some duplicate observations in the new table. So I used this code to remove duplicate values:

PROC SQL;
CREATE TABLE work.new_table as
SELECT
DISTINCT *
FROM work.old_table;
QUIT;

ballardw · Posted 03-06-2017 04:04 PM

@glcoolj12 wrote:

Thank you ballardw! This worked - only issue I had was that my dataset is ~3000 observations and this program created some duplicate observations in the new table. So I used this code to remove duplicate values:

I would go back to the data to confirm that the layout is as you expected. It may be that for some ID values you have MULTIPLE sets of responses in terms of "EVENT". Since you didn't show any multiple event data I did not consider that in a solution. It is very likely that all of the JOIN ON conditions need to have the event values match, which would mean adding them in the select clauses as well as the condition.

glcoolj12 · Posted 03-07-2017 02:13 PM

Yes, that worked, thank you! I appreciate all of your help.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away