BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
glcoolj12
Obsidian | Level 7

Good Afternoon,

 

I'm looking to transform a large data set from long to wide by ID.  I've pasted an example of what the data looks like currently.  I'd like to have a column for ID, RX (response from Result_char), RX_12MO (response from result_num), RX_TIME (response from result_num), RX_COND (response from result_char).  PROC TRANSPOSE does not allow me to keep multiple response.  Would I need several DO loops linking via Loop_index?

IDVariableResult_charResult_numLoop_item_IDLoop_indexEvent
333rx_12mo_aaNULL1MED01037
333rx_12mo_abNULL1MED01137
333rx_12mo_acNULL1MED01237
333rx_12mo_adNULL1MED01337
333rx_12mo_aeNULL2MED01437
333rx_time_aaNULL3MED01037
333rx_time_abNULL3MED01137
333rx_time_acNULL3MED01237
333rx_time_adNULL3MED01337
333rx_time_aeNULL1MED01437
333rxdrugXNULLMED01037
333rxdrugyNULLMED01137
333rxdrugXNULLMED01237
333rxdrugzNULLMED01337
333rxdrugXNULLMED01437
333rx_cond_aaasthmaNULLMED01037
333rx_cond_abasthmaNULLMED01137
333rx_cond_acasthmaNULLMED01237
333rx_cond_adallergiesNULLMED01337
333rx_cond_aeasthmaNULLMED01437

 

I'd like the results to be presented like this:

IDRXRX_12MORX_TIMERX_CONDEVENT
333drugX13asthma37
333drugY13asthma37

 

Any help would be very much appreciated!!

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

This may get you started. I wouldn't want to go through too many additional variables though. I would seriously discuss this data layout with the source as the only thing I've seen this problematic before is the old Lotus 123 PRN file layout.

 

proc sql;
   create table work.want as
   select  a.id,c.rx,a.Rx_12_mo,b.Rx_time,d.rx_cond
   ,a.event 
   from 
      (select id, result_num as Rx_12_mo,event,Loop_index from work.example where index(variable,"rx_12mo")=1) as a
      left join
      (select id, result_num as Rx_time,Loop_index from work.example where index(variable,"rx_time")=1) as b
      on a.id=b.id and a.Loop_index=b.Loop_index
      left join 
      (select id, Result_char as Rx,Loop_index from work.example where variable='rx') as c
      on a.id=c.id and a.Loop_index=c.loop_index
      left join
      (select id, Result_char as Rx_Cond,Loop_index from work.example where index(variable,"rx_cond")=1) as d
      on a.id=d.id and a.Loop_index=d.Loop_index
   ;
quit;

View solution in original post

8 REPLIES 8
ballardw
Super User

You have RX_12MO_AA to RX_12MO_AE and similar for RX_time. How do we know which is associated with drugx or drugy? How do we get 1 for the value of the RX_12MO since there are 1 and 2 in the "result_num" values, and similar for 3 for RX_time since it appears we have a choice between 1,2,3; Rx_cond has choices as shown between asthma and allergies so how do we know that the result should be ashtma?

glcoolj12
Obsidian | Level 7

Hi ballardW,

My apologies I left out a column.  The responses for RX:  are linked through the loop_index.

 

IDRXRX_12MORX_TIMERX_CONDEVENTLoop_index
333drugX13asthma370
333drugy13asthma371
ballardw
Super User

Can you provide an example starting data set? Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

I get concerned about conversions as SAS normally  doesn't use NULL and I'm not sure what type any of your variables start as even though they look numeric and whether the results are numeric or character.

 

Also, since one solution I have in mind could involve arrays what is the maximum number of "loop index" values for all of the IDs involved.

glcoolj12
Obsidian | Level 7
data WORK.EXAMPLE;
  infile datalines dsd truncover;
  input ID:BEST12. Variable:$10. Result_char:$9. Result_num:$4. Loop_item_ID:$5. Loop_index:BEST12. Event:BEST12.;
datalines4;
333,rx_12mo_aa,NULL,1,MED01,0,37
333,rx_12mo_ab,NULL,1,MED01,1,37
333,rx_12mo_ac,NULL,1,MED01,2,37
333,rx_12mo_ad,NULL,1,MED01,3,37
333,rx_12mo_ae,NULL,2,MED01,4,37
333,rx_time_aa,NULL,3,MED01,0,37
333,rx_time_ab,NULL,3,MED01,1,37
333,rx_time_ac,NULL,3,MED01,2,37
333,rx_time_ad,NULL,3,MED01,3,37
333,rx_time_ae,NULL,1,MED01,4,37
333,rx,drugX,NULL,MED01,0,37
333,rx,drugy,NULL,MED01,1,37
333,rx,drugX,NULL,MED01,2,37
333,rx,drugz,NULL,MED01,3,37
333,rx,drugX,NULL,MED01,4,37
333,rx_cond_aa,asthma,NULL,MED01,0,37
333,rx_cond_ab,asthma,NULL,MED01,1,37
333,rx_cond_ac,asthma,NULL,MED01,2,37
333,rx_cond_ad,allergies,NULL,MED01,3,37
333,rx_cond_ae,asthma,NULL,MED01,4,37
;;;;

The maximum number of Loop_index is six (values 0 to 5).  The ID and Event are numeric variables - everything else is a character variable.  

ballardw
Super User

This may get you started. I wouldn't want to go through too many additional variables though. I would seriously discuss this data layout with the source as the only thing I've seen this problematic before is the old Lotus 123 PRN file layout.

 

proc sql;
   create table work.want as
   select  a.id,c.rx,a.Rx_12_mo,b.Rx_time,d.rx_cond
   ,a.event 
   from 
      (select id, result_num as Rx_12_mo,event,Loop_index from work.example where index(variable,"rx_12mo")=1) as a
      left join
      (select id, result_num as Rx_time,Loop_index from work.example where index(variable,"rx_time")=1) as b
      on a.id=b.id and a.Loop_index=b.Loop_index
      left join 
      (select id, Result_char as Rx,Loop_index from work.example where variable='rx') as c
      on a.id=c.id and a.Loop_index=c.loop_index
      left join
      (select id, Result_char as Rx_Cond,Loop_index from work.example where index(variable,"rx_cond")=1) as d
      on a.id=d.id and a.Loop_index=d.Loop_index
   ;
quit;
glcoolj12
Obsidian | Level 7

Thank you ballardw!  This worked - only issue I had was that my dataset is ~3000 observations and this program created some duplicate observations in the new table.  So I used this code to remove duplicate values:

PROC SQL;
CREATE TABLE work.new_table as
SELECT
DISTINCT *
FROM work.old_table;
QUIT;
ballardw
Super User

@glcoolj12 wrote:

Thank you ballardw!  This worked - only issue I had was that my dataset is ~3000 observations and this program created some duplicate observations in the new table.  So I used this code to remove duplicate values:


 

I would go back to the data to confirm that the layout is as you expected. It may be that for some ID values you have MULTIPLE sets of responses in terms of "EVENT". Since you didn't show any multiple event data I did not consider that in a solution. It is very likely that all of the JOIN ON conditions need to have the event values match, which would mean adding them in the select clauses as well as the condition.

glcoolj12
Obsidian | Level 7
Yes, that worked, thank you! I appreciate all of your help.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2605 views
  • 2 likes
  • 2 in conversation