BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
UcheOkoro
Lapis Lazuli | Level 10

Please, I need help converting data from wide to long. I have a data set which has data in wide form. It has drug_class1 - drug_class25 , drug_name1-drug_name25 and  the  drug indication starts from  drug_indication1_21-drug_indication25_40..

I would like to transpose the data to look like this 

encounter_id  drug_class drug_name  indication21  .................................indication40.

 

My main problem is how to transpose the drug_indication. I tried the following code

 

proc transpose data=Data_new out=long_indication_meds prefix=indication_meds;
   by encounterid;
   var indication_meds1___21 - indication_meds25___40;run;

but got the following error message  

ERROR: Either roots don't match or start suffix after end suffix.

 

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If your source data was a text file and you copy something like the first 5 lines and paste them into a text box (critical!) we may be able to show you how to read that file directly to create that data structure.

If your source data was in a different format tell us what type file.

 

Consider this example, which only has 3 sets of values consisting of two "words" and 5 numeric values:

data example;
   infile datalines dlm=',' dsd;
   informat worda wordb $10. value1-value5  best.;
   input id @;
   do i=1 to 3;
      input worda wordb value1-value5 @;
      output;
   end;
   input;
   drop i;
datalines;
1,abc,pdq,1,0,0,1,1,zz,top,0,0,0,0,1,john,doe,1,1,1,1,0
1,aaa,ggq,1,0,0,1,1,z3,bgt,0,0,0,0,1,mary,pit,1,1,1,1,0
;

The infile statement would point to a text file but here it shows some options if the original data were comma separated (CSV). The DLM option tells SAS to expect a comma between values. The DSD option means that if there are two successive delimiters to treat that variable as missing.

The Informat describes how to read each variable, worda and wordb as character, and then 5 numeric variables as numbers (not actually needed as a default but to make basis for when more complicated things come up). The ID could be numeric or character.

The first input reads just the ID value. The @ symbol at the end of an input means "hold the reading pointer on the current record where it is", which in this case would be right after the ID.

Since we KNOW we have 3 sets of variables we can use a loop to specify that number of times to read, still using the @ to hold the pointer. The OUTPUT statement says to write the current data to the output data set.

After the Loop we need an explicit Input to advance to the next row of data.

The I value isn't needed, unless you want a line or group indicator for some reason. If so pick a name that means more than the "i" and don't use the Drop statement.

 

If not all 3 groups are present you will get some blank lines currently. That could be fixed by using "If not missing(worda) then output;" instead of the simple output.

 

Another option might be to EXPORT your existing data to CSV and write a program similar to above to read it.

You did mention other variables. Where they appear is going to make a difference but as long as the pattern is regular it is quite doable, just somewhat cumbersome code.

For example if you had indication1 otherindication1 indication2 otherindication2 etc as the column order then the Input statement would just list those.

The headaches come if the "other" are mixed in after the the drug type and name group of values.

 

 

View solution in original post

5 REPLIES 5
ballardw
Super User

When you use variable list like Var_1 - Var15 SAS expects there to be 15 variables, sequentially numbered and the "root" which would be "Var_" to be the same.

Your use drug_indication1_21-drug_indication25_40 does not have a "root" that is the same. From that I have no idea if the "next" variable name after drug_indication1_21 would be drug_indication2_21,drug_indication1_22, drug_indication2_22 or what. Neither does SAS.

So either type out all of the names or use one of the other forms of lists.

If all of the drug_indication variables are adjacent columns (run proc contents and see if the variable # in the output is sequential for those variables). If so you could use the two dash list:  drug_indication1_21 -- drug_indication25_40 , which would use the variables in column order.

If they are not sequential then possibly you could use the list form of : drug_indication:  the : after the part of the name says to use ALL of the variables whose names start with the common characters. There could however be a an issue that the order variable names are added to the list is not the expected order.

 

Your code shows :prefix=indication_meds

This means the output variables would be named indication_meds1, indication_meds2, ... indication_medsXX

not the stated Indication21

 

However it looks more like what you want to do is RENAME variables. Which could be done in Proc datasets or  a data step (inefficient in cpu time, about the same in coding time)

proc datasets library=work;
   modify data_new;
      rename 
         drug_indication1_21= indication21
         ...(you provide each pair)
         drug_indication25_40= indication40
      ;
   run;
quit;

If not Provide some example data, with only 3 drug class and names with maybe 5 or 6 "drug_indication" variables. Then show what you expect to see for output.

UcheOkoro
Lapis Lazuli | Level 10

Thank you so much for your response. The indication_21.......indication_40 was an error. What I had in mind was drug_indication21.........drug_indication_40.

I did a proc contents and noticed that the variable # in the output is not sequential for those variables because there is another set of variables other_indication1.......other_indications25 which has character observations( eg. "History of allergy to iodine ".)

Renaming the variable is going to be a gruely task because they are over 500. 

 

Sample data:

What I have is the following

 

id  drug_class1  drug_name1  drug_indication1_21 ........ drug_indication1_40   ........drug_class25     drug_name25  drug_indication25_21 ...... drug_indication25_40  

1        NSAID                Aspirin           1                       ..............      0                             Antihistamine     Cetirizine               0                                       1

2        Opioid           hydrocodone       0                                          1                              NSAID                 Aspirin                   1                                     0

 

What I want to see is 

 

What I expect to see is the following

 

 s/n      id  drug_class      drug_name     drug_indication21................................drug_indication40

   1       1    NSAID              Aspirin                1 ......................................... ...................0

   2       .           .                          .                   .............................................................

     .        .           .                           .                .......................................................................

      .      .           .                           .                 ..................................................................

     .       .            .                           .              ...................................................................

     .     .            .                           .                  ..................................................................

  25    1   Antihistamine     Cetirizine          0................................................................ ..1   

  26    2     Opioid              hydrocodone    1 ............................................ .....................0

          .           .                          .                   .............................................................

         .           .                           .                .......................................................................

         .           .                           .                 ..................................................................

         .            .                           .              ...................................................................

         .            .                           .                  ..................................................................

  50    2    NSAID                Aspirin            1......................................................................0  

 

I hope this is clear?

Thank you.

                                     

ballardw
Super User

If your source data was a text file and you copy something like the first 5 lines and paste them into a text box (critical!) we may be able to show you how to read that file directly to create that data structure.

If your source data was in a different format tell us what type file.

 

Consider this example, which only has 3 sets of values consisting of two "words" and 5 numeric values:

data example;
   infile datalines dlm=',' dsd;
   informat worda wordb $10. value1-value5  best.;
   input id @;
   do i=1 to 3;
      input worda wordb value1-value5 @;
      output;
   end;
   input;
   drop i;
datalines;
1,abc,pdq,1,0,0,1,1,zz,top,0,0,0,0,1,john,doe,1,1,1,1,0
1,aaa,ggq,1,0,0,1,1,z3,bgt,0,0,0,0,1,mary,pit,1,1,1,1,0
;

The infile statement would point to a text file but here it shows some options if the original data were comma separated (CSV). The DLM option tells SAS to expect a comma between values. The DSD option means that if there are two successive delimiters to treat that variable as missing.

The Informat describes how to read each variable, worda and wordb as character, and then 5 numeric variables as numbers (not actually needed as a default but to make basis for when more complicated things come up). The ID could be numeric or character.

The first input reads just the ID value. The @ symbol at the end of an input means "hold the reading pointer on the current record where it is", which in this case would be right after the ID.

Since we KNOW we have 3 sets of variables we can use a loop to specify that number of times to read, still using the @ to hold the pointer. The OUTPUT statement says to write the current data to the output data set.

After the Loop we need an explicit Input to advance to the next row of data.

The I value isn't needed, unless you want a line or group indicator for some reason. If so pick a name that means more than the "i" and don't use the Drop statement.

 

If not all 3 groups are present you will get some blank lines currently. That could be fixed by using "If not missing(worda) then output;" instead of the simple output.

 

Another option might be to EXPORT your existing data to CSV and write a program similar to above to read it.

You did mention other variables. Where they appear is going to make a difference but as long as the pattern is regular it is quite doable, just somewhat cumbersome code.

For example if you had indication1 otherindication1 indication2 otherindication2 etc as the column order then the Input statement would just list those.

The headaches come if the "other" are mixed in after the the drug type and name group of values.

 

 

UcheOkoro
Lapis Lazuli | Level 10

Thank you so much for your response. I have attached a few lines of the data. I hope this makes it clearer.

 

Thank you.

UcheOkoro
Lapis Lazuli | Level 10

Renaming the variables worked for me. Thank you so much

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2130 views
  • 0 likes
  • 2 in conversation