Re: Making multiple columns into multiple rows

jerry898969 · Posted 06-09-2010 10:16 AM

I have a dataset that has the following data structure

1-1 2-1 3-1 4-1 5-1
1-1 2-1 3-1 4-1 5-2
1-1 2-2 3-2 . .
1-1 2-2 3-2 4-2 5-3

I need a way to create a two column parent/child table without adding the same combination twice.

I need the final result to
1-1 2-1
2-1 3-1
4-1 5-1
4-1 5-2
1-1 2-2
2-2 3-2
3-2 4-2
4-2 5-3

I tried using an array but I didn't come close to getting the output I was looking for.

Thank you for your help.

deleted_user · Posted 06-09-2010 10:23 AM

try this

let say five variables are a1 b1 c1 d1 e1 and output variables are first and second.

data t2 (keep = first second);
set t1;
first = a1;
second = b1;
output;
first = b1;
second = c1;
output;
first = c1;
second = d1;
output;
first = d1;
second = e1;
output;
run;

GL.

deleted_user · Posted 06-09-2010 10:26 AM

add proc sort to remove duplicate combination.

Cynthia_sas · Posted 06-09-2010 10:31 AM

Hi:
I guess I don't understand your data -- what are your variable names??? What do the variable values actually look like...are they 1-1 and 2-1 for some variable value??? Or was that mode of showing the data meant to convey something??

For example, for these 4 rows,
[pre]
1-1 2-1 3-1 4-1 5-1
1-1 2-1 3-1 4-1 5-2
1-1 2-2 3-2 . .
1-1 2-2 3-2 4-2 5-3
[/pre]

...I see that 1-1 is repeated on all 4 rows. the first 2 rows are the same except instead of 5-1, the second row has 5-2 -- what does that mean?? What is the significance of the .. in the third row???? Do 2 columns/variables have missing values for variable 4 and variable 5??? What is the significance of the fact that the 3rd row (with the 2 missing values) has 2-2 and 3-2 instead of 2-1 and 3-1??

What is the logic by which you are doing the transform? How do your columns then translate to rows??? What are your variable names? Can you say a little more about how you consider this to be a "two column parent/child table"????

cynthia

jerry898969 · Posted 06-09-2010 10:48 AM

Cynthia,

this data was just an example of how my data is laid out. I have 5 columns within my data set. The first two columns are always populated the other 3 may or may not have data. I created a id column for each of my 5 label columns. So the first column may have

TEST
TEST
TESTING

So the id for this column would be

1 TEST
1 TEST
2 TESTING

The same goes for all of 5 columns. From there I have to take the id from label1 make that my parent to label2's id and then take label2's id and make it the parent for label3's id. When there are missing values I have to skip over those. I'm trying to make a hierarchical tree. The reason 1-1 shows up multiple times is because the column i'm basing my id off of has the same value for all 4 rows.

thanks to everyone for your their help

Cynthia_sas · Posted 06-09-2010 12:25 PM

I'm still confused. You have 5 columns in the first example and 2 columns in the second example...what are the column names??? VAR1, VAR2, VAR3, VAR4, VAR5??? You have now shown 2 different forms of the variables 1-1, 2-1, 3-1 ,etc; and 1 TEST, 1 TEST, 2 TESTING, etc.

cynthia

jerry898969 · Posted 06-09-2010 12:37 PM

Cynthia,
Sorry for the confusion.

Say I have 5 variable id's called var1-var5.
I need to take var1 and make it a parent of var2. So in my original example
var1 = 1-1 and var2 = 2-1. I need to create a data set that has 1-1 in the parent column and 2-1 in the child column.
After that I have to make var2 the parent of var3 so 2-1 will go in parent column and 3-1 will go in child column. This will
continue in the same way where var3 is parent to var4 and var4 to var5.
If there is a blank then nothing should go into the parent/child table.
My first example was fake string values.

The reason the 1-1 is duplicated is because the id is based a different field. I'm using the by statement on that field to create
an id column that increments when the field value changes. In terms of a tree all 4 rows start from the same node then they
split into two branches and so on.
1-1 2-1 3-1 4-1 5-1
1-1 2-1 3-1 4-1 5-2
1-1 2-2 3-2 . .
1-1 2-2 3-2 4-2 5-3

Thank you for your help

Flip · Posted 06-09-2010 01:32 PM

data new;
set ;
array orig {5} var1-var5;
do i = 1 to 4;
v1 = orig(i);
v2 = orig(i+1);
if v2 ne '' then output;
end;
run;
proc sort nodupkey;
by v1 v2;
run;

jerry898969 · Posted 06-09-2010 02:19 PM

Flip & sasteach,

Thank you both for your help. Using your suggestions I got it working like I want. Now I hope the data model doesn't change. LOL

Thank you

SAS Innovate 2025: Call for Content

Classroom Training Available!