DATA Step, Macro, Functions and more

From one variable to a multiple

Accepted Solution Solved
Reply
Contributor
Posts: 29
Accepted Solution

From one variable to a multiple

[ Edited ]

Dear Experts

 

I have a question regarding making a single variable into multiple variables

 

I have a variable (diagnoses1) that is defined in the following manor

 

date#diagnosis#typeofdiagnosis#date#diagnosis#typeofdiagnosis#...... and these pairs can be replicated for a maximum of 100 times, but I dont know exactly how many times for the individual patient;

 

The dates has the following format dd/mm/yyyy; The diagnosis is a stringvariable and the type of diagnosis is also a string variable.

 

So in real life an example could be

 

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

 

What I would like is the following variables

 

d1date1 d1diagnosis1 d1type1

d1date2 d1diagnosis2 d1type2

...... for as many as there are in the initial variable diagnoses1.

I hope this makes sence.

 

I will look forward to seeing your suggestions

 

Regards

 

Solvej


Accepted Solutions
Solution
‎04-17-2018 08:29 AM
Super User
Super User
Posts: 9,802

Re: From on variable to a multiple

Are the data items always present?  if so:

data want;
  set have;
  do i=1 to countw(diag,"#") by 3;
    date=input(scan(diag,i,"#"),ddmmyy10.);
    diagnosis=scan(diag,i+1,"#");
    type=scan(diag,i+3,"#");
    output;
  end;
run;
    

 This will loop over the text and for each three items create the variables and output.  Note, if there isn't always 3 then you will get errors.  Post test data in the form of a datastep and required output, otherwise we have to guess.

View solution in original post


All Replies
Solution
‎04-17-2018 08:29 AM
Super User
Super User
Posts: 9,802

Re: From on variable to a multiple

Are the data items always present?  if so:

data want;
  set have;
  do i=1 to countw(diag,"#") by 3;
    date=input(scan(diag,i,"#"),ddmmyy10.);
    diagnosis=scan(diag,i+1,"#");
    type=scan(diag,i+3,"#");
    output;
  end;
run;
    

 This will loop over the text and for each three items create the variables and output.  Note, if there isn't always 3 then you will get errors.  Post test data in the form of a datastep and required output, otherwise we have to guess.

Contributor
Posts: 29

Re: From on variable to a multiple

Dear RW9

 

thank you for your answer.

It seems to be working. It does however produce a multiple of lines for each set of three. Can this be fixed in the statement or will I have to reshape in order to only have one line per initial variable?

 

This is what I have now

 

Diagnoses1date1diagnosistype      

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

01-01-2001df100H      

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

01-02-2001df200H      

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

01-03-2001....      
          
          
What I would likeD1date1D1diagnosis1D1type1D1date2D1diagnosis2D1type2D1date3D1diagnosis31type3

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

01-01-2001df100H01-02-2001df200H01-03-2001....
          

 

Kind regards

 

Solvej

 

 

Super User
Posts: 10,530

Re: From on variable to a multiple

Don't do that. Wide datasets are inherently harder to handle than long ones. By-group processing is easier than handling arrays, especially when the number of items for a group is not known beforehand.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 29

Re: From on variable to a multiple

Posted in reply to KurtBremser

Dear KurtBremser

 

I agree however for this specific task it would be very helpful onwards to have one line per patient.

 

Regards

 

Solvej

Super User
Posts: 23,958

Re: From on variable to a multiple


@Solvej wrote:

Dear KurtBremser

 

I agree however for this specific task it would be very helpful onwards to have one line per patient.

 

Regards

 

Solvej


No, it really doesn't. Having worked with health care data for years...it really really does not help. There are ways to account for things in a long format that is MUCH easier - once you learn it. Otherwise you'll be stuck using a lot of array loops.

Super User
Super User
Posts: 9,802

Re: From on variable to a multiple

That is by design.  It is never a good idea to work with transposed data within programming.  Industry standards - CDISC - are normalised also.  The reason is that any transposed dataset requires far more programming and messy code to work with than normalised, and at the end of the day if you need a transposed dataset, then it is just a proc transpose step away.  So from my original code:

data want;
  set have; 
  ord=0;
  do i=1 to countw(diag,"#") by 3;
    ord=ord+1;
    date=input(scan(diag,i,"#"),ddmmyy10.);
    diagnosis=scan(diag,i+1,"#");
    type=scan(diag,i+3,"#");
    output;
  end;
run;
proc transpose data=want out=tran1 prefix=d1date;
  by ...;
  var date;
  id ord;
run;
proc transpose data=want out=tran2 prefix=d1diagnosis;
  by ...;
  var date;
  id ord;
run;
proc transpose data=want out=tran3 prefix=d1type;
  by ...;
  var date;
  id ord;
run;
data final;
  merge tran1 tran2 tran3;
  by ...;
run;

Note I have put elipses in the by clause, as you have not shown any other variables so I don't know these.

Again, I really advise you not to transpose any data unless it is really needed - i.e. in a final output report or something - no good will come of it.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 114 views
  • 3 likes
  • 4 in conversation