Solved
Contributor
Posts: 29

# From one variable to a multiple

[ Edited ]

Dear Experts

I have a question regarding making a single variable into multiple variables

I have a variable (diagnoses1) that is defined in the following manor

date#diagnosis#typeofdiagnosis#date#diagnosis#typeofdiagnosis#...... and these pairs can be replicated for a maximum of 100 times, but I dont know exactly how many times for the individual patient;

The dates has the following format dd/mm/yyyy; The diagnosis is a stringvariable and the type of diagnosis is also a string variable.

So in real life an example could be

01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#...

What I would like is the following variables

d1date1 d1diagnosis1 d1type1

d1date2 d1diagnosis2 d1type2

...... for as many as there are in the initial variable diagnoses1.

I hope this makes sence.

I will look forward to seeing your suggestions

Regards

Solvej

Accepted Solutions
Solution
‎04-17-2018 08:29 AM
Super User
Posts: 9,802

## Re: From on variable to a multiple

Are the data items always present?  if so:

```data want;
set have;
do i=1 to countw(diag,"#") by 3;
date=input(scan(diag,i,"#"),ddmmyy10.);
diagnosis=scan(diag,i+1,"#");
type=scan(diag,i+3,"#");
output;
end;
run;
```

This will loop over the text and for each three items create the variables and output.  Note, if there isn't always 3 then you will get errors.  Post test data in the form of a datastep and required output, otherwise we have to guess.

All Replies
Solution
‎04-17-2018 08:29 AM
Super User
Posts: 9,802

## Re: From on variable to a multiple

Are the data items always present?  if so:

```data want;
set have;
do i=1 to countw(diag,"#") by 3;
date=input(scan(diag,i,"#"),ddmmyy10.);
diagnosis=scan(diag,i+1,"#");
type=scan(diag,i+3,"#");
output;
end;
run;
```

This will loop over the text and for each three items create the variables and output.  Note, if there isn't always 3 then you will get errors.  Post test data in the form of a datastep and required output, otherwise we have to guess.

Contributor
Posts: 29

## Re: From on variable to a multiple

Dear RW9

It seems to be working. It does however produce a multiple of lines for each set of three. Can this be fixed in the statement or will I have to reshape in order to only have one line per initial variable?

This is what I have now

 Diagnoses1 date1 diagnosis type 01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#... 01-01-2001 df100 H 01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#... 01-02-2001 df200 H 01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#... 01-03-2001 .. .. What I would like D1date1 D1diagnosis1 D1type1 D1date2 D1diagnosis2 D1type2 D1date3 D1diagnosis3 1type3 01/01/2001#df100#H#01/02/2001#df200#H#01/03/2001#... 01-01-2001 df100 H 01-02-2001 df200 H 01-03-2001 .. ..

Kind regards

Solvej

Super User
Posts: 10,530

## Re: From on variable to a multiple

Don't do that. Wide datasets are inherently harder to handle than long ones. By-group processing is easier than handling arrays, especially when the number of items for a group is not known beforehand.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 29

## Re: From on variable to a multiple

Dear KurtBremser

I agree however for this specific task it would be very helpful onwards to have one line per patient.

Regards

Solvej

Super User
Posts: 23,958

## Re: From on variable to a multiple

@Solvej wrote:

Dear KurtBremser

I agree however for this specific task it would be very helpful onwards to have one line per patient.

Regards

Solvej

No, it really doesn't. Having worked with health care data for years...it really really does not help. There are ways to account for things in a long format that is MUCH easier - once you learn it. Otherwise you'll be stuck using a lot of array loops.

Super User
Posts: 9,802

## Re: From on variable to a multiple

That is by design.  It is never a good idea to work with transposed data within programming.  Industry standards - CDISC - are normalised also.  The reason is that any transposed dataset requires far more programming and messy code to work with than normalised, and at the end of the day if you need a transposed dataset, then it is just a proc transpose step away.  So from my original code:

```data want;
set have;
ord=0;
do i=1 to countw(diag,"#") by 3;
ord=ord+1;
date=input(scan(diag,i,"#"),ddmmyy10.);
diagnosis=scan(diag,i+1,"#");
type=scan(diag,i+3,"#");
output;
end;
run;
proc transpose data=want out=tran1 prefix=d1date;
by ...;
var date;
id ord;
run;
proc transpose data=want out=tran2 prefix=d1diagnosis;
by ...;
var date;
id ord;
run;
proc transpose data=want out=tran3 prefix=d1type;
by ...;
var date;
id ord;
run;
data final;
merge tran1 tran2 tran3;
by ...;
run;```

Note I have put elipses in the by clause, as you have not shown any other variables so I don't know these.

Again, I really advise you not to transpose any data unless it is really needed - i.e. in a final output report or something - no good will come of it.

☑ This topic is solved.