Creat additional columns to avoid duplicate data

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Creat additional columns to avoid duplicate data

Hi, I have a dataset that is duplicating data from two rows if there is more than one variable in the final column like below:

 

ID   fruit            quality

1     banana      yellow

2     apple         green

2     apple         round

 

What I Want is for each row to be unique like below, such that if the quality column has multiple entries, it will create a new column and insert it there - column_2, column_3, column_4 and so on.

 

ID   fruit           quality_1    Quality_2

1    banana      yellow

2    apple         green          round

 

 

I have looked around the forum and some text books looking for an answer to this and haven't found much - any help would be appreciated. Thanks!

 


Accepted Solutions
Solution
‎12-15-2017 12:33 PM
Super User
Posts: 13,498

Re: Creat additional columns to avoid duplicate data

[ Edited ]
Posted in reply to andrewjason

That is called transposing data from long to wide. Proc transpose will do this is SAS.

For almost every purpose processing is easier in the long form.

Example: if both Apple and Banana have the quality of "sweet" there is no way to ensure that Sweet is in the same quality variable so you spend lots of time having to search through many variables for everything done later on. And if you later have another data set to combine the quality value for Apple is very likely to appear in a different variable for the same value and the number of quality variables may change. Which complicates all of those searches through multiple variables to determine if "sweet" is one of the qualities.

 

Sort the data by Id fruit.

Then

Proc transpose data=have out=want prefix=quanlity_;

by id fruit;

var quality;

run;

View solution in original post


All Replies
Super User
Posts: 23,663

Re: Creat additional columns to avoid duplicate data

Posted in reply to andrewjason

Have you looked at PROC TRANSPOSE? That works fine for me and generates the output you indicated.

 

delete_transpose.JPG


andrewjason wrote:

Hi, I have a dataset that is duplicating data from two rows if there is more than one variable in the final column like below:

 

ID   fruit            quality

1     banana      yellow

2     apple         green

2     apple         round

 

What I Want is for each row to be unique like below, such that if the quality column has multiple entries, it will create a new column and insert it there - column_2, column_3, column_4 and so on.

 

ID   fruit           quality_1    Quality_2

1    banana      yellow

2    apple         green          round

 

 

I have looked around the forum and some text books looking for an answer to this and haven't found much - any help would be appreciated. Thanks!

 


 

New Contributor
Posts: 3

Re: Creat additional columns to avoid duplicate data

This is great! thanks!
Solution
‎12-15-2017 12:33 PM
Super User
Posts: 13,498

Re: Creat additional columns to avoid duplicate data

[ Edited ]
Posted in reply to andrewjason

That is called transposing data from long to wide. Proc transpose will do this is SAS.

For almost every purpose processing is easier in the long form.

Example: if both Apple and Banana have the quality of "sweet" there is no way to ensure that Sweet is in the same quality variable so you spend lots of time having to search through many variables for everything done later on. And if you later have another data set to combine the quality value for Apple is very likely to appear in a different variable for the same value and the number of quality variables may change. Which complicates all of those searches through multiple variables to determine if "sweet" is one of the qualities.

 

Sort the data by Id fruit.

Then

Proc transpose data=have out=want prefix=quanlity_;

by id fruit;

var quality;

run;

New Contributor
Posts: 3

Re: Creat additional columns to avoid duplicate data

Great Explanation! Thank you!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 150 views
  • 2 likes
  • 3 in conversation