I have 180 columns containing the same character data reported each year over many years (from a longitudinal data set). I am looking to combine these columns into one singular column so that I can compute two way frequencies etc. I have looked into "append" however I understand this is for datasets not variables. I have also tried a transpose, however I end up with 9000 columns (number of patients). Other similar posts suggest an array for this type of data, however they all require you to enter datelines. My dataset is much too large to record each individual dateline! Hope someone has insight as to how to do this, or possibly a better method!
Current Data:
Patient ID Year 1. Year 2. Year 3. Year 4. Year 5
1. ABC ABB ABA ABC BAC
2 ACA ABC ABB ABA AAC
3 ABB ABA ACA ABC AAC
What I want:
Patient ID Health Condition
1. ABC
1. ABB
1. ABA
1. ABC
1. BAC
2. ACA
2. ABC
2. ABB
2. ABA
2. AAC
3. ABB
3. ABA
3. ACA
3. ABC
3. AAC
It works for me
data have;
infile datalines truncover dsd dlm=' ';
input (Patient_ID Year_1 Year_2 Year_3 Year_4 Year_5) ($);
datalines;
1 ABC ABB ABA ABC BAC
2 ACA ABC ABB ABA AAC
3 ABB ABA ACA ABC AAC
;
PROC TRANSPOSE data=have out=want name=Year prefix=Health_Condition;
var Year_1 Year_2 Year_3 Year_4 Year_5;
by Patient_ID;
run;
Or here a data step approach which I would expect to perform better.
data want2;
set have;
array _y {*} year_:;
do _i=1 to dim(_y);
Year=input(vname(_y[_i]),$6.);
Health_Condition=_y[_i];
output;
end;
keep Patient_ID Year Health_Condition;
run;
Hi @PaigeMiller,
Thanks for your advice! This is the right idea, however I am looking for it to be the actual data (health condition e.g., ABC) in each cell that is listed rather than the column name. I am sure there is a way to do this, but can't seem to figure it out. My code and output using PROC TRANSPOSE is shown below, please refer to my original post for what I would like my data to look like!
PROC TRANSPOSE data=file out=file_transposed;
var Year_1 Year_2 Year_3 Year_4 Year_5;
by PatientID;
run;
PatientID. NAME
1. Year 1
1. Year 2
1. Year 3
1. Year 4
1. Year 5
2. Year 1
2. Year 2
2. Year 3
2. Year 4
2. Year 5
3. Year 1
3. Year 2
3. Year 3
3. Year 4
3. Year 5
It works for me
data have;
infile datalines truncover dsd dlm=' ';
input (Patient_ID Year_1 Year_2 Year_3 Year_4 Year_5) ($);
datalines;
1 ABC ABB ABA ABC BAC
2 ACA ABC ABB ABA AAC
3 ABB ABA ACA ABC AAC
;
PROC TRANSPOSE data=have out=want name=Year prefix=Health_Condition;
var Year_1 Year_2 Year_3 Year_4 Year_5;
by Patient_ID;
run;
Or here a data step approach which I would expect to perform better.
data want2;
set have;
array _y {*} year_:;
do _i=1 to dim(_y);
Year=input(vname(_y[_i]),$6.);
Health_Condition=_y[_i];
output;
end;
keep Patient_ID Year Health_Condition;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.