Re: Joining multiple columns into one and then doing the reverse of th...

AB1976 · Posted 08-22-2024 03:49 PM

Hello

I want to join multiple columns that have the same type of data, into 1 column so that I'm able to conduct analysis on just 1 column instead of multiple columns with the same type of data.

Below is an example of how the data is currently structured

item cat1 cat2 cat3 cat4 cat5
A 111 222 333 444 555
B 666 777 888 999 101
C 112 212 313 414 515
D 123 234 567 890 145

I want the data to look like this:

item cat1

A 111

A 222

A 333

B 666

B 777

B 888

C 999

C 555

C 101

D 789

D 456

Data Want;
set Have ;
cat 1 = cat 2;
keep cat 1 ;
run;

I would also like to know how to get the data back to the original structure once I've completed my analysis please ?

Kurt_Bremser · Posted 08-22-2024 04:48 PM

Why do you exclude cat4 and cat5?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AB1976 · Posted 08-22-2024 04:55 PM

@Kurt_Bremser Are you referring to the data step?

Kurt_Bremser · Posted 08-23-2024 02:20 AM

@AB1976 wrote:
@Kurt_Bremser Are you referring to the data step?

Your "want" data does not have the values from these variables.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 08-22-2024 05:00 PM

Proc Transpose will often do what you are requesting:

Note the data step to provide example data. This is the preferred way to share data on the forum as we can see pretty quickly what the variable types and properties (formats, labels) may be. Copy text from your editor, open a text box on the forum with the </> icon that appears above the message window and paste.

data have;
  input item $ cat1  cat2  cat3  cat4  cat5 ;
datalines;
A    111   222   333   444   555   
B    666   777   888   999   101             
C    112   212   313   414   515 
D    123   234   567   890   145     
;
/* if not actually sorted*/
proc sort data=have;
   by item;
run;

proc transpose data=have out=want
     prefix=cat;
by item;
var cat1 - cat5;
run;
   
/* leave the _name_ variable in the data!!*/
/* IF you do anything to that set may need to 
resort. If you add variables you have to describe 
what you expect to do */

Proc transpose data=want out=original (drop=_name_);
  by item;
  id _name_;
  var cat1;
run;

Proc transpose seldom gets what you want with a mix of numeric and character variables (placing both in one column means that the column is character).

IF you have multiple rows of identical Item (or other BY variables) then the problem may be a bit stickier as you may end up with more than one variable in the first transpose. Which means a different approach of simulating the first transpose. And may require adding more information to the data to "get back" to original.

Personally I would make sure not to lose the original data set.

AB1976 · Posted 08-22-2024 05:46 PM

thank you @ballardw

the data I have is from a data set that I've already ran/created can i still use the same code ?

ballardw · Posted 08-22-2024 06:19 PM

@AB1976 wrote:
thank you @ballardw

the data I have is from a data set that I've already ran/created can i still use the same code ?

Make sure that you don't overwrite your original data and go ahead and test it. The data step I wrote was just to have something to demonstrate one basic approach if the data is as clean as you showed with the example.

Note that the Proc transpose code I show assumes your observations are sorted. If that is not the case you might be able to use the NOTSORTED option on the BY in the Proc Transpose.

I included all the variables. If you don't include all of the variables then you will have to make a very clear description of what "doing the reverse" would entail. The bit you showed only showed apparently transposing 3 of the 5 Cat variables in the example. Once those values are gone there really would be no way to recover them from the shown process. Something could be done but details would matter.

Astounding · Posted 08-23-2024 02:54 AM

Looking at the data, your objective is clear as mud. So here is an approach that might work, but will at least gets the dialog moving in a productive direction.
First, create a data set with your "combined" variable:
data want;
set have;
comb = cat(cat1, cat2, cat3);
run;
So both the original variables as well as the combined variable are now in your data.
Or perhaps move in the other direction:
data want:
set have;
v = var1;
output;
v = var2;
output;
v = var3;
output;
run;

Next, conduct your analysis but use any variables you like.
At the end, drop the variables or observations that you no longer need to return the data to its original form.

Kurt_Bremser · Posted 08-23-2024 05:01 AM

PS

The fact that you already need to transpose for analysis is an asteroid-sized hint that your wide data structure is crap for working with it. Stay with the long format, and create wide reports where needed with the reporting procedures.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Joining multiple columns into one and then doing the reverse of that after conducting analysis

Re: Joining multiple columns into one and then doing the reverse of that after conducting analysis

Re: Joining multiple columns into one and then doing the reverse of that after conducting analysis

Re: Joining multiple columns into one and then doing the reverse of that after conducting analysis

Re: Joining multiple columns into one and then doing the reverse of that after conducting analysis

Re: Joining multiple columns into one and then doing the reverse of that after conducting analysis

Re: Joining multiple columns into one and then doing the reverse of that after conducting analysis

Re: Joining multiple columns into one and then doing the reverse of that after conducting analysis

Re: Joining multiple columns into one and then doing the reverse of that after conducting analysis

Registration is open

SAS Training: Just a Click Away