Hello
I want to join multiple columns that have the same type of data, into 1 column so that I'm able to conduct analysis on just 1 column instead of multiple columns with the same type of data.
Below is an example of how the data is currently structured
item cat1 cat2 cat3 cat4 cat5
A 111 222 333 444 555
B 666 777 888 999 101
C 112 212 313 414 515
D 123 234 567 890 145
I want the data to look like this:
item cat1
A 111
A 222
A 333
B 666
B 777
B 888
C 999
C 555
C 101
D 789
D 456
Data Want; set Have ; cat 1 = cat 2; keep cat 1 ; run;
I would also like to know how to get the data back to the original structure once I've completed my analysis please ?
@AB1976 wrote:
@Kurt_Bremser Are you referring to the data step?
Your "want" data does not have the values from these variables.
Proc Transpose will often do what you are requesting:
Note the data step to provide example data. This is the preferred way to share data on the forum as we can see pretty quickly what the variable types and properties (formats, labels) may be. Copy text from your editor, open a text box on the forum with the </> icon that appears above the message window and paste.
data have; input item $ cat1 cat2 cat3 cat4 cat5 ; datalines; A 111 222 333 444 555 B 666 777 888 999 101 C 112 212 313 414 515 D 123 234 567 890 145 ; /* if not actually sorted*/ proc sort data=have; by item; run; proc transpose data=have out=want prefix=cat; by item; var cat1 - cat5; run; /* leave the _name_ variable in the data!!*/ /* IF you do anything to that set may need to resort. If you add variables you have to describe what you expect to do */ Proc transpose data=want out=original (drop=_name_); by item; id _name_; var cat1; run;
Proc transpose seldom gets what you want with a mix of numeric and character variables (placing both in one column means that the column is character).
IF you have multiple rows of identical Item (or other BY variables) then the problem may be a bit stickier as you may end up with more than one variable in the first transpose. Which means a different approach of simulating the first transpose. And may require adding more information to the data to "get back" to original.
Personally I would make sure not to lose the original data set.
@AB1976 wrote:
thank you @ballardw
the data I have is from a data set that I've already ran/created can i still use the same code ?
Make sure that you don't overwrite your original data and go ahead and test it. The data step I wrote was just to have something to demonstrate one basic approach if the data is as clean as you showed with the example.
Note that the Proc transpose code I show assumes your observations are sorted. If that is not the case you might be able to use the NOTSORTED option on the BY in the Proc Transpose.
I included all the variables. If you don't include all of the variables then you will have to make a very clear description of what "doing the reverse" would entail. The bit you showed only showed apparently transposing 3 of the 5 Cat variables in the example. Once those values are gone there really would be no way to recover them from the shown process. Something could be done but details would matter.
PS
The fact that you already need to transpose for analysis is an asteroid-sized hint that your wide data structure is crap for working with it. Stay with the long format, and create wide reports where needed with the reporting procedures.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.