Solved: Large dataset: Creating one variable (column) from multiple years of d...

EMc · Posted 04-17-2021 07:27 PM

I have 180 columns containing the same character data reported each year over many years (from a longitudinal data set). I am looking to combine these columns into one singular column so that I can compute two way frequencies etc. I have looked into "append" however I understand this is for datasets not variables. I have also tried a transpose, however I end up with 9000 columns (number of patients). Other similar posts suggest an array for this type of data, however they all require you to enter datelines. My dataset is much too large to record each individual dateline! Hope someone has insight as to how to do this, or possibly a better method!

Current Data:

Patient ID Year 1. Year 2. Year 3. Year 4. Year 5

1. ABC ABB ABA ABC BAC

2 ACA ABC ABB ABA AAC

3 ABB ABA ACA ABC AAC

What I want:

Patient ID Health Condition

1. ABC

1. ABB

1. ABA

1. ABC

1. BAC

2. ACA

2. ABC

2. ABB

2. ABA

2. AAC

3. ABB

3. ABA

3. ACA

3. ABC

3. AAC

Patrick · Posted 04-17-2021 10:02 PM

It works for me

data have;
  infile datalines truncover dsd dlm=' ';
  input (Patient_ID Year_1 Year_2 Year_3 Year_4 Year_5) ($);
  datalines;
1 ABC ABB ABA ABC BAC
2 ACA ABC ABB ABA AAC
3 ABB ABA ACA ABC AAC
;

PROC TRANSPOSE data=have out=want name=Year prefix=Health_Condition;
  var Year_1 Year_2 Year_3 Year_4 Year_5;
  by Patient_ID;
run;

Or here a data step approach which I would expect to perform better.

data want2;
  set have;
  array _y {*} year_:;
  do _i=1 to dim(_y);
    Year=input(vname(_y[_i]),$6.);
    Health_Condition=_y[_i];
    output;
  end;
  keep Patient_ID Year Health_Condition;
run;

View solution in original post

PaigeMiller · Posted 04-17-2021 08:33 PM

Use PROC TRANSPOSE with a BY PATIENTID; statement.

--
Paige Miller

EMc · Posted 04-17-2021 09:16 PM

Hi @PaigeMiller,
Thanks for your advice! This is the right idea, however I am looking for it to be the actual data (health condition e.g., ABC) in each cell that is listed rather than the column name. I am sure there is a way to do this, but can't seem to figure it out. My code and output using PROC TRANSPOSE is shown below, please refer to my original post for what I would like my data to look like!

PROC TRANSPOSE data=file out=file_transposed;
var Year_1 Year_2 Year_3 Year_4 Year_5;
by PatientID;
run;

PatientID. NAME
1. Year 1
1. Year 2
1. Year 3
1. Year 4
1. Year 5
2. Year 1
2. Year 2
2. Year 3
2. Year 4
2. Year 5
3. Year 1
3. Year 2
3. Year 3
3. Year 4
3. Year 5

Patrick · Posted 04-17-2021 10:02 PM

It works for me

data have;
  infile datalines truncover dsd dlm=' ';
  input (Patient_ID Year_1 Year_2 Year_3 Year_4 Year_5) ($);
  datalines;
1 ABC ABB ABA ABC BAC
2 ACA ABC ABB ABA AAC
3 ABB ABA ACA ABC AAC
;

PROC TRANSPOSE data=have out=want name=Year prefix=Health_Condition;
  var Year_1 Year_2 Year_3 Year_4 Year_5;
  by Patient_ID;
run;

Or here a data step approach which I would expect to perform better.

data want2;
  set have;
  array _y {*} year_:;
  do _i=1 to dim(_y);
    Year=input(vname(_y[_i]),$6.);
    Health_Condition=_y[_i];
    output;
  end;
  keep Patient_ID Year Health_Condition;
run;

EMc · Posted 04-18-2021 05:18 PM

Thanks a lot! Data step was exactly what I needed.

Large dataset: Creating one variable (column) from multiple years of data

Re: Large dataset: Creating one variable (column) from multiple years of data

Re: Large dataset: Creating one variable (column) from multiple years of data

Re: Large dataset: Creating one variable (column) from multiple years of data

Re: Large dataset: Creating one variable (column) from multiple years of data

Re: Large dataset: Creating one variable (column) from multiple years of data

Catch up on SAS Innovate 2026

Large dataset: Creating one variable (column) from multiple years of data

Re: Large dataset: Creating one variable (column) from multiple years of data

Re: Large dataset: Creating one variable (column) from multiple years of data

Re: Large dataset: Creating one variable (column) from multiple years of data

Re: Large dataset: Creating one variable (column) from multiple years of data

Re: Large dataset: Creating one variable (column) from multiple years of data

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away