I have 180 columns containing the same character data reported each year over many years (from a longitudinal data set). I am looking to combine these columns into one singular column so that I can compute two way frequencies etc. I have looked into "append" however I understand this is for datasets not variables. I have also tried a transpose, however I end up with 9000 columns (number of patients). Other similar posts suggest an array for this type of data, however they all require you to enter datelines. My dataset is much too large to record each individual dateline! Hope someone has insight as to how to do this, or possibly a better method!
Current Data:
Patient ID Year 1. Year 2. Year 3. Year 4. Year 5
1. ABC ABB ABA ABC BAC
2 ACA ABC ABB ABA AAC
3 ABB ABA ACA ABC AAC
What I want:
Patient ID Health Condition
1. ABC
1. ABB
1. ABA
1. ABC
1. BAC
2. ACA
2. ABC
2. ABB
2. ABA
2. AAC
3. ABB
3. ABA
3. ACA
3. ABC
3. AAC
It works for me
data have;
infile datalines truncover dsd dlm=' ';
input (Patient_ID Year_1 Year_2 Year_3 Year_4 Year_5) ($);
datalines;
1 ABC ABB ABA ABC BAC
2 ACA ABC ABB ABA AAC
3 ABB ABA ACA ABC AAC
;
PROC TRANSPOSE data=have out=want name=Year prefix=Health_Condition;
var Year_1 Year_2 Year_3 Year_4 Year_5;
by Patient_ID;
run;
Or here a data step approach which I would expect to perform better.
data want2;
set have;
array _y {*} year_:;
do _i=1 to dim(_y);
Year=input(vname(_y[_i]),$6.);
Health_Condition=_y[_i];
output;
end;
keep Patient_ID Year Health_Condition;
run;
Hi @PaigeMiller,
Thanks for your advice! This is the right idea, however I am looking for it to be the actual data (health condition e.g., ABC) in each cell that is listed rather than the column name. I am sure there is a way to do this, but can't seem to figure it out. My code and output using PROC TRANSPOSE is shown below, please refer to my original post for what I would like my data to look like!
PROC TRANSPOSE data=file out=file_transposed;
var Year_1 Year_2 Year_3 Year_4 Year_5;
by PatientID;
run;
PatientID. NAME
1. Year 1
1. Year 2
1. Year 3
1. Year 4
1. Year 5
2. Year 1
2. Year 2
2. Year 3
2. Year 4
2. Year 5
3. Year 1
3. Year 2
3. Year 3
3. Year 4
3. Year 5
It works for me
data have;
infile datalines truncover dsd dlm=' ';
input (Patient_ID Year_1 Year_2 Year_3 Year_4 Year_5) ($);
datalines;
1 ABC ABB ABA ABC BAC
2 ACA ABC ABB ABA AAC
3 ABB ABA ACA ABC AAC
;
PROC TRANSPOSE data=have out=want name=Year prefix=Health_Condition;
var Year_1 Year_2 Year_3 Year_4 Year_5;
by Patient_ID;
run;
Or here a data step approach which I would expect to perform better.
data want2;
set have;
array _y {*} year_:;
do _i=1 to dim(_y);
Year=input(vname(_y[_i]),$6.);
Health_Condition=_y[_i];
output;
end;
keep Patient_ID Year Health_Condition;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.