Hi, I have a dataset that is duplicating data from two rows if there is more than one variable in the final column like below:
ID fruit quality
1 banana yellow
2 apple green
2 apple round
What I Want is for each row to be unique like below, such that if the quality column has multiple entries, it will create a new column and insert it there - column_2, column_3, column_4 and so on.
ID fruit quality_1 Quality_2
1 banana yellow
2 apple green round
I have looked around the forum and some text books looking for an answer to this and haven't found much - any help would be appreciated. Thanks!
That is called transposing data from long to wide. Proc transpose will do this is SAS.
For almost every purpose processing is easier in the long form.
Example: if both Apple and Banana have the quality of "sweet" there is no way to ensure that Sweet is in the same quality variable so you spend lots of time having to search through many variables for everything done later on. And if you later have another data set to combine the quality value for Apple is very likely to appear in a different variable for the same value and the number of quality variables may change. Which complicates all of those searches through multiple variables to determine if "sweet" is one of the qualities.
Sort the data by Id fruit.
Then
Proc transpose data=have out=want prefix=quanlity_;
by id fruit;
var quality;
run;
Have you looked at PROC TRANSPOSE? That works fine for me and generates the output you indicated.
@andrewjason wrote:
Hi, I have a dataset that is duplicating data from two rows if there is more than one variable in the final column like below:
ID fruit quality
1 banana yellow
2 apple green
2 apple round
What I Want is for each row to be unique like below, such that if the quality column has multiple entries, it will create a new column and insert it there - column_2, column_3, column_4 and so on.
ID fruit quality_1 Quality_2
1 banana yellow
2 apple green round
I have looked around the forum and some text books looking for an answer to this and haven't found much - any help would be appreciated. Thanks!
That is called transposing data from long to wide. Proc transpose will do this is SAS.
For almost every purpose processing is easier in the long form.
Example: if both Apple and Banana have the quality of "sweet" there is no way to ensure that Sweet is in the same quality variable so you spend lots of time having to search through many variables for everything done later on. And if you later have another data set to combine the quality value for Apple is very likely to appear in a different variable for the same value and the number of quality variables may change. Which complicates all of those searches through multiple variables to determine if "sweet" is one of the qualities.
Sort the data by Id fruit.
Then
Proc transpose data=have out=want prefix=quanlity_;
by id fruit;
var quality;
run;
Great Explanation! Thank you!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.