Solved: From one variable to a multiple

Solvej · Posted 04-11-2018 05:37 AM

Dear Experts

I have a question regarding making a single variable into multiple variables

I have a variable (diagnoses1) that is defined in the following manor

date#diagnosis#typeofdiagnosis#date#diagnosis#typeofdiagnosis#...... and these pairs can be replicated for a maximum of 100 times, but I dont know exactly how many times for the individual patient;

The dates has the following format dd/mm/yyyy; The diagnosis is a stringvariable and the type of diagnosis is also a string variable.

So in real life an example could be

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

What I would like is the following variables

d1date1 d1diagnosis1 d1type1

d1date2 d1diagnosis2 d1type2

...... for as many as there are in the initial variable diagnoses1.

I hope this makes sence.

I will look forward to seeing your suggestions

Regards

Solvej

RW9 · Posted 04-11-2018 06:07 AM

Are the data items always present? if so:

data want;
  set have;
  do i=1 to countw(diag,"#") by 3;
    date=input(scan(diag,i,"#"),ddmmyy10.);
    diagnosis=scan(diag,i+1,"#");
    type=scan(diag,i+3,"#");
    output;
  end;
run;

This will loop over the text and for each three items create the variables and output. Note, if there isn't always 3 then you will get errors. Post test data in the form of a datastep and required output, otherwise we have to guess.

View solution in original post

RW9 · Posted 04-11-2018 06:07 AM

Are the data items always present? if so:

data want;
  set have;
  do i=1 to countw(diag,"#") by 3;
    date=input(scan(diag,i,"#"),ddmmyy10.);
    diagnosis=scan(diag,i+1,"#");
    type=scan(diag,i+3,"#");
    output;
  end;
run;

This will loop over the text and for each three items create the variables and output. Note, if there isn't always 3 then you will get errors. Post test data in the form of a datastep and required output, otherwise we have to guess.

Solvej · Posted 04-11-2018 07:16 AM

Dear RW9

thank you for your answer.

It seems to be working. It does however produce a multiple of lines for each set of three. Can this be fixed in the statement or will I have to reshape in order to only have one line per initial variable?

This is what I have now

Diagnoses1	date1	diagnosis	type
01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...	01-01-2001	df100	H
01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...	01-02-2001	df200	H
01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...	01-03-2001	..	..


What I would like	D1date1	D1diagnosis1	D1type1	D1date2	D1diagnosis2	D1type2	D1date3	D1diagnosis3	1type3
01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...	01-01-2001	df100	H	01-02-2001	df200	H	01-03-2001	..	..

Kind regards

Solvej

Kurt_Bremser · Posted 04-11-2018 07:23 AM

Don't do that. Wide datasets are inherently harder to handle than long ones. By-group processing is easier than handling arrays, especially when the number of items for a group is not known beforehand.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Solvej · Posted 04-11-2018 07:25 AM

Dear KurtBremser

I agree however for this specific task it would be very helpful onwards to have one line per patient.

Regards

Solvej

Reeza · Posted 04-11-2018 01:21 PM

@Solvej wrote:

Dear KurtBremser

I agree however for this specific task it would be very helpful onwards to have one line per patient.

Regards

Solvej

No, it really doesn't. Having worked with health care data for years...it really really does not help. There are ways to account for things in a long format that is MUCH easier - once you learn it. Otherwise you'll be stuck using a lot of array loops.

RW9 · Posted 04-11-2018 08:27 AM

That is by design. It is never a good idea to work with transposed data within programming. Industry standards - CDISC - are normalised also. The reason is that any transposed dataset requires far more programming and messy code to work with than normalised, and at the end of the day if you need a transposed dataset, then it is just a proc transpose step away. So from my original code:

data want;
  set have; 
  ord=0;
  do i=1 to countw(diag,"#") by 3;
    ord=ord+1;
    date=input(scan(diag,i,"#"),ddmmyy10.);
    diagnosis=scan(diag,i+1,"#");
    type=scan(diag,i+3,"#");
    output;
  end;
run;
proc transpose data=want out=tran1 prefix=d1date;
  by ...;
  var date;
  id ord;
run;
proc transpose data=want out=tran2 prefix=d1diagnosis;
  by ...;
  var date;
  id ord;
run;
proc transpose data=want out=tran3 prefix=d1type;
  by ...;
  var date;
  id ord;
run;
data final;
  merge tran1 tran2 tran3;
  by ...;
run;

Note I have put elipses in the by clause, as you have not shown any other variables so I don't know these.

Again, I really advise you not to transpose any data unless it is really needed - i.e. in a final output report or something - no good will come of it.

From one variable to a multiple

Re: From on variable to a multiple

Re: From on variable to a multiple

Re: From on variable to a multiple

Re: From on variable to a multiple

Re: From on variable to a multiple

Re: From on variable to a multiple

Re: From on variable to a multiple

Register Today!

SAS Training: Just a Click Away