BookmarkSubscribeRSS Feed
AB1976
Calcite | Level 5

Hello 

 

I want to join multiple columns that have the same type of data, into 1 column so that I'm able to conduct analysis on just 1 column instead of multiple columns with the same type of data.

 

Below is an example of how the data is currently structured

 

item cat1  cat2  cat3  cat4  cat5 
A    111   222   333   444   555   
  666   777   888   999   101             
  112   212   313   414   515 
D    123   234   567   890   145     

 

I want the data to look like this:

 

item cat1

A    111

A    222

A    333

B    666

B    777

B    888

C    999

C    555

C    101

D    789

D    456

 

Data Want;
set Have ;
cat 1 = cat 2;
keep cat 1 ;
run;

 

I would also like to know how to get the data back to the original structure once I've completed my analysis please ?

 

 

8 REPLIES 8
ballardw
Super User

Proc Transpose will often do what you are requesting:

Note the data step to provide example data. This is the preferred way to share data on the forum as we can see pretty quickly what the variable types and properties (formats, labels) may be. Copy text from your editor, open a text box on the forum with the </> icon that appears above the message window and paste.

data have;
  input item $ cat1  cat2  cat3  cat4  cat5 ;
datalines;
A    111   222   333   444   555   
B    666   777   888   999   101             
C    112   212   313   414   515 
D    123   234   567   890   145     
;
/* if not actually sorted*/
proc sort data=have;
   by item;
run;

proc transpose data=have out=want
     prefix=cat;
by item;
var cat1 - cat5;
run;
   
/* leave the _name_ variable in the data!!*/
/* IF you do anything to that set may need to 
resort. If you add variables you have to describe 
what you expect to do */

Proc transpose data=want out=original (drop=_name_);
  by item;
  id _name_;
  var cat1;
run;

Proc transpose seldom gets what you want with a mix of numeric and character variables (placing both in one column means that the column is character).

 

IF you have multiple rows of identical Item (or other BY variables) then the problem may be a bit stickier as you may end up with more than one variable in the first transpose. Which means a different approach of simulating the first transpose. And may require adding more information to the data to "get back" to original.

Personally I would make sure not to lose the original data set.

AB1976
Calcite | Level 5
thank you @ballardw

the data I have is from a data set that I've already ran/created can i still use the same code ?
ballardw
Super User

@AB1976 wrote:
thank you @ballardw

the data I have is from a data set that I've already ran/created can i still use the same code ?

Make sure that you don't overwrite your original data and go ahead and test it. The data step I wrote was just to have something to demonstrate one basic approach if the data is as clean as you showed with the example.

 

Note that the Proc transpose code I show assumes your observations are sorted. If that is not the case you might be able to use the NOTSORTED option on the BY in the Proc Transpose.

 

I included all the variables. If you don't include all of the variables then you will have to make a very clear description of what "doing the reverse" would entail. The bit you showed only showed apparently transposing 3 of the 5 Cat variables in the example. Once those values are gone there really would be no way to recover them from the shown process. Something could be done but details would matter.

Astounding
PROC Star
Looking at the data, your objective is clear as mud. So here is an approach that might work, but will at least gets the dialog moving in a productive direction.
First, create a data set with your "combined" variable:
data want;
set have;
comb = cat(cat1, cat2, cat3);
run;
So both the original variables as well as the combined variable are now in your data.
Or perhaps move in the other direction:
data want:
set have;
v = var1;
output;
v = var2;
output;
v = var3;
output;
run;

Next, conduct your analysis but use any variables you like.
At the end, drop the variables or observations that you no longer need to return the data to its original form.
Kurt_Bremser
Super User

PS

 

The fact that you already need to transpose for analysis is an asteroid-sized hint that your wide data structure is crap for working with it. Stay with the long format, and create wide reports where needed with the reporting procedures.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 677 views
  • 0 likes
  • 4 in conversation