BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Solvej
Obsidian | Level 7

Dear Experts

 

I have a question regarding making a single variable into multiple variables

 

I have a variable (diagnoses1) that is defined in the following manor

 

date#diagnosis#typeofdiagnosis#date#diagnosis#typeofdiagnosis#...... and these pairs can be replicated for a maximum of 100 times, but I dont know exactly how many times for the individual patient;

 

The dates has the following format dd/mm/yyyy; The diagnosis is a stringvariable and the type of diagnosis is also a string variable.

 

So in real life an example could be

 

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

 

What I would like is the following variables

 

d1date1 d1diagnosis1 d1type1

d1date2 d1diagnosis2 d1type2

...... for as many as there are in the initial variable diagnoses1.

I hope this makes sence.

 

I will look forward to seeing your suggestions

 

Regards

 

Solvej

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Are the data items always present?  if so:

data want;
  set have;
  do i=1 to countw(diag,"#") by 3;
    date=input(scan(diag,i,"#"),ddmmyy10.);
    diagnosis=scan(diag,i+1,"#");
    type=scan(diag,i+3,"#");
    output;
  end;
run;
    

 This will loop over the text and for each three items create the variables and output.  Note, if there isn't always 3 then you will get errors.  Post test data in the form of a datastep and required output, otherwise we have to guess.

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Are the data items always present?  if so:

data want;
  set have;
  do i=1 to countw(diag,"#") by 3;
    date=input(scan(diag,i,"#"),ddmmyy10.);
    diagnosis=scan(diag,i+1,"#");
    type=scan(diag,i+3,"#");
    output;
  end;
run;
    

 This will loop over the text and for each three items create the variables and output.  Note, if there isn't always 3 then you will get errors.  Post test data in the form of a datastep and required output, otherwise we have to guess.

Solvej
Obsidian | Level 7

Dear RW9

 

thank you for your answer.

It seems to be working. It does however produce a multiple of lines for each set of three. Can this be fixed in the statement or will I have to reshape in order to only have one line per initial variable?

 

This is what I have now

 

Diagnoses1date1diagnosistype      

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

01-01-2001df100H      

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

01-02-2001df200H      

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

01-03-2001....      
          
          
What I would likeD1date1D1diagnosis1D1type1D1date2D1diagnosis2D1type2D1date3D1diagnosis31type3

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

01-01-2001df100H01-02-2001df200H01-03-2001....
          

 

Kind regards

 

Solvej

 

 

Kurt_Bremser
Super User

Don't do that. Wide datasets are inherently harder to handle than long ones. By-group processing is easier than handling arrays, especially when the number of items for a group is not known beforehand.

Solvej
Obsidian | Level 7

Dear KurtBremser

 

I agree however for this specific task it would be very helpful onwards to have one line per patient.

 

Regards

 

Solvej

Reeza
Super User

@Solvej wrote:

Dear KurtBremser

 

I agree however for this specific task it would be very helpful onwards to have one line per patient.

 

Regards

 

Solvej


No, it really doesn't. Having worked with health care data for years...it really really does not help. There are ways to account for things in a long format that is MUCH easier - once you learn it. Otherwise you'll be stuck using a lot of array loops.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

That is by design.  It is never a good idea to work with transposed data within programming.  Industry standards - CDISC - are normalised also.  The reason is that any transposed dataset requires far more programming and messy code to work with than normalised, and at the end of the day if you need a transposed dataset, then it is just a proc transpose step away.  So from my original code:

data want;
  set have; 
  ord=0;
  do i=1 to countw(diag,"#") by 3;
    ord=ord+1;
    date=input(scan(diag,i,"#"),ddmmyy10.);
    diagnosis=scan(diag,i+1,"#");
    type=scan(diag,i+3,"#");
    output;
  end;
run;
proc transpose data=want out=tran1 prefix=d1date;
  by ...;
  var date;
  id ord;
run;
proc transpose data=want out=tran2 prefix=d1diagnosis;
  by ...;
  var date;
  id ord;
run;
proc transpose data=want out=tran3 prefix=d1type;
  by ...;
  var date;
  id ord;
run;
data final;
  merge tran1 tran2 tran3;
  by ...;
run;

Note I have put elipses in the by clause, as you have not shown any other variables so I don't know these.

Again, I really advise you not to transpose any data unless it is really needed - i.e. in a final output report or something - no good will come of it.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1253 views
  • 3 likes
  • 4 in conversation