Good Afternoon,
I'm looking to transform a large data set from long to wide by ID. I've pasted an example of what the data looks like currently. I'd like to have a column for ID, RX (response from Result_char), RX_12MO (response from result_num), RX_TIME (response from result_num), RX_COND (response from result_char). PROC TRANSPOSE does not allow me to keep multiple response. Would I need several DO loops linking via Loop_index?
ID | Variable | Result_char | Result_num | Loop_item_ID | Loop_index | Event |
333 | rx_12mo_aa | NULL | 1 | MED01 | 0 | 37 |
333 | rx_12mo_ab | NULL | 1 | MED01 | 1 | 37 |
333 | rx_12mo_ac | NULL | 1 | MED01 | 2 | 37 |
333 | rx_12mo_ad | NULL | 1 | MED01 | 3 | 37 |
333 | rx_12mo_ae | NULL | 2 | MED01 | 4 | 37 |
333 | rx_time_aa | NULL | 3 | MED01 | 0 | 37 |
333 | rx_time_ab | NULL | 3 | MED01 | 1 | 37 |
333 | rx_time_ac | NULL | 3 | MED01 | 2 | 37 |
333 | rx_time_ad | NULL | 3 | MED01 | 3 | 37 |
333 | rx_time_ae | NULL | 1 | MED01 | 4 | 37 |
333 | rx | drugX | NULL | MED01 | 0 | 37 |
333 | rx | drugy | NULL | MED01 | 1 | 37 |
333 | rx | drugX | NULL | MED01 | 2 | 37 |
333 | rx | drugz | NULL | MED01 | 3 | 37 |
333 | rx | drugX | NULL | MED01 | 4 | 37 |
333 | rx_cond_aa | asthma | NULL | MED01 | 0 | 37 |
333 | rx_cond_ab | asthma | NULL | MED01 | 1 | 37 |
333 | rx_cond_ac | asthma | NULL | MED01 | 2 | 37 |
333 | rx_cond_ad | allergies | NULL | MED01 | 3 | 37 |
333 | rx_cond_ae | asthma | NULL | MED01 | 4 | 37 |
I'd like the results to be presented like this:
ID | RX | RX_12MO | RX_TIME | RX_COND | EVENT |
333 | drugX | 1 | 3 | asthma | 37 |
333 | drugY | 1 | 3 | asthma | 37 |
Any help would be very much appreciated!!
Thank you!
This may get you started. I wouldn't want to go through too many additional variables though. I would seriously discuss this data layout with the source as the only thing I've seen this problematic before is the old Lotus 123 PRN file layout.
proc sql; create table work.want as select a.id,c.rx,a.Rx_12_mo,b.Rx_time,d.rx_cond ,a.event from (select id, result_num as Rx_12_mo,event,Loop_index from work.example where index(variable,"rx_12mo")=1) as a left join (select id, result_num as Rx_time,Loop_index from work.example where index(variable,"rx_time")=1) as b on a.id=b.id and a.Loop_index=b.Loop_index left join (select id, Result_char as Rx,Loop_index from work.example where variable='rx') as c on a.id=c.id and a.Loop_index=c.loop_index left join (select id, Result_char as Rx_Cond,Loop_index from work.example where index(variable,"rx_cond")=1) as d on a.id=d.id and a.Loop_index=d.Loop_index ; quit;
You have RX_12MO_AA to RX_12MO_AE and similar for RX_time. How do we know which is associated with drugx or drugy? How do we get 1 for the value of the RX_12MO since there are 1 and 2 in the "result_num" values, and similar for 3 for RX_time since it appears we have a choice between 1,2,3; Rx_cond has choices as shown between asthma and allergies so how do we know that the result should be ashtma?
Hi ballardW,
My apologies I left out a column. The responses for RX: are linked through the loop_index.
ID | RX | RX_12MO | RX_TIME | RX_COND | EVENT | Loop_index |
333 | drugX | 1 | 3 | asthma | 37 | 0 |
333 | drugy | 1 | 3 | asthma | 37 | 1 |
Can you provide an example starting data set? Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
I get concerned about conversions as SAS normally doesn't use NULL and I'm not sure what type any of your variables start as even though they look numeric and whether the results are numeric or character.
Also, since one solution I have in mind could involve arrays what is the maximum number of "loop index" values for all of the IDs involved.
data WORK.EXAMPLE; infile datalines dsd truncover; input ID:BEST12. Variable:$10. Result_char:$9. Result_num:$4. Loop_item_ID:$5. Loop_index:BEST12. Event:BEST12.; datalines4; 333,rx_12mo_aa,NULL,1,MED01,0,37 333,rx_12mo_ab,NULL,1,MED01,1,37 333,rx_12mo_ac,NULL,1,MED01,2,37 333,rx_12mo_ad,NULL,1,MED01,3,37 333,rx_12mo_ae,NULL,2,MED01,4,37 333,rx_time_aa,NULL,3,MED01,0,37 333,rx_time_ab,NULL,3,MED01,1,37 333,rx_time_ac,NULL,3,MED01,2,37 333,rx_time_ad,NULL,3,MED01,3,37 333,rx_time_ae,NULL,1,MED01,4,37 333,rx,drugX,NULL,MED01,0,37 333,rx,drugy,NULL,MED01,1,37 333,rx,drugX,NULL,MED01,2,37 333,rx,drugz,NULL,MED01,3,37 333,rx,drugX,NULL,MED01,4,37 333,rx_cond_aa,asthma,NULL,MED01,0,37 333,rx_cond_ab,asthma,NULL,MED01,1,37 333,rx_cond_ac,asthma,NULL,MED01,2,37 333,rx_cond_ad,allergies,NULL,MED01,3,37 333,rx_cond_ae,asthma,NULL,MED01,4,37 ;;;;
The maximum number of Loop_index is six (values 0 to 5). The ID and Event are numeric variables - everything else is a character variable.
This may get you started. I wouldn't want to go through too many additional variables though. I would seriously discuss this data layout with the source as the only thing I've seen this problematic before is the old Lotus 123 PRN file layout.
proc sql; create table work.want as select a.id,c.rx,a.Rx_12_mo,b.Rx_time,d.rx_cond ,a.event from (select id, result_num as Rx_12_mo,event,Loop_index from work.example where index(variable,"rx_12mo")=1) as a left join (select id, result_num as Rx_time,Loop_index from work.example where index(variable,"rx_time")=1) as b on a.id=b.id and a.Loop_index=b.Loop_index left join (select id, Result_char as Rx,Loop_index from work.example where variable='rx') as c on a.id=c.id and a.Loop_index=c.loop_index left join (select id, Result_char as Rx_Cond,Loop_index from work.example where index(variable,"rx_cond")=1) as d on a.id=d.id and a.Loop_index=d.Loop_index ; quit;
Thank you ballardw! This worked - only issue I had was that my dataset is ~3000 observations and this program created some duplicate observations in the new table. So I used this code to remove duplicate values:
PROC SQL; CREATE TABLE work.new_table as SELECT DISTINCT * FROM work.old_table; QUIT;
@glcoolj12 wrote:
Thank you ballardw! This worked - only issue I had was that my dataset is ~3000 observations and this program created some duplicate observations in the new table. So I used this code to remove duplicate values:
I would go back to the data to confirm that the layout is as you expected. It may be that for some ID values you have MULTIPLE sets of responses in terms of "EVENT". Since you didn't show any multiple event data I did not consider that in a solution. It is very likely that all of the JOIN ON conditions need to have the event values match, which would mean adding them in the select clauses as well as the condition.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.