Solved: Re: Need help with combining multiple columns into one. Thanks!!!

Echo77 · Posted 02-15-2014 12:08 PM

My data looks like this:

Period x1 x2 x3 x4

1 1 . . .

2 2 . . .

3 3 5 . .

4 4 6 . .

5 . 7 9 .

6 . 8 10 .

7 . . 11 .

8 . . 12 13

Any idea how to combine into to the following form? Thanks for your help!

period x

1 1

2 2

3 5

4 6

5 9

6 10

7 11

8 13

A better way to rephrase my questions is that:I have 5 columns(variables) in my dataset and 8 rows(observations). What I wanted to do is to select the first 2 observations from column x1, the 2-4th from column x2, the 5th-7th in column x3, and 8th in column x4 and combine those values into one column x.

Tom · Posted 02-16-2014 02:56 PM

Or if instead you want to base when to switch on the value of the PERIOD variable you might do it this way.

data want2 ;

array cutoff (4) _temporary_ (2,4,7,10000);

do j=1 to 4 ;

do while (period < cutoff(j) );

set have;

array x x:;

new=x(j);

output;

end;

keep period new;

run;

View solution in original post

overmar · Posted 02-15-2014 12:20 PM

It looks like what you are asking for is a new dataset with period, and the maximum of all of the values for each period. The below code shows two different ways to do it.

proc sql;

create table want as

select distinct period, (max(x1,x2,x3,x4)) as x

from have

group by period;

quit;

OR

data want;

set have;

array xs (*) x1-x4;

x = max(of xs(*));

keep period x;

run;

Hope this helps

Echo77 · Posted 02-15-2014 12:34 PM

Thanks a lot for your quick response!! This is not exactly what I wanted though. Those numbers are a little bit misleading..I need to revise those number a little bit.

The original data looks like this:

Period x1 x2 x3 x4

1 a1 . . .

2 a2 . . .

3 a3 a5 . .

4 a4 a6 . .

5 . a7 a9 .

6 . a8 a10 .

7 . . a11 .

8 . . a12 a13

The one I wanted:

period x

1 a1

2 a2

3 a5

4 a6

5 a9

6 a10

7 a11

8 a13

So those are just random numbers, no definite mathematical relationships. Suppose their are two columns, both with non-missing values for period i, I want to always take the value for period i from the column with a larger column number. In the example, I have non-missing value for period 3 in both column 2 and column 3, I'll take the value from column 3 which is a5.

Any idea? Thanks!

overmar · Posted 02-15-2014 12:52 PM

This should get you what you are looking for, the by -1 is essential as it then counts the columns backwards and the while will stop the do loop if there was a value found in the last iteration. I don't think it should give you an array out of range if you do not have any values in x1-x4, but if it does you may want to add this code within the do loop as this should solve the problem. /*if i = 1 and xs(1) = . then x = .;*/

Oh and if your variables are characters replace all of the . with "" in the code.

data want;

set have;

x = .;

array xs (4) x1-x4;

do i = 4 to 1 by -1 while (x = .);;

if xs(i) ~=. then x = xs(i);

end;

keep period x;

run;

Echo77 · Posted 02-15-2014 02:48 PM

This does solve the problem I listed. But I just realize that my data is more complicated than the simplified version...Thanks a lot though!!

overmar · Posted 02-15-2014 02:51 PM

How much more complicated? My guess is the answer is only the difference in a couple lines of code.

Echo77 · Posted 02-15-2014 11:43 PM

I have 36 columns and 3600 rows and some columns contains all missing values. After I posted the response to your answer, I realize that was not the best way to express my question. The question really was that I'd love to select some rows from those columns and then combine them into one column. And I know exactly which ones I need from each column.

In the example, I want to select the first 2 observations from column x1, the 2-4th from column x2, the 5th-7th in column x3, and 8th in column x4 and combine those values into one column x.

I can use multiple (if _n_>a and _n_<b, then x=xi)statements to do this, but I'm wondering if there is a more elegant way to do this?

Tom · Posted 02-16-2014 02:28 PM

So if you know the number of values you want from each column you could do it this way.

data have ;

input Period x1-x4 @@ ;

cards4;

1 1 . . . 2 2 . . . 3 3 5 . . 4 4 6 . . 5 . 7 9 .

6 . 8 10 . 7 . . 11 . 8 . . 12 13

;;;;

data want ;

do n=2,2,3,100000;

j+1;

do n=n to 1 by -1 ;

set have;

array x x:;

new=x(j);

output;

end;

keep period new;

run;

Tom · Posted 02-16-2014 02:56 PM

Or if instead you want to base when to switch on the value of the PERIOD variable you might do it this way.

data want2 ;

array cutoff (4) _temporary_ (2,4,7,10000);

do j=1 to 4 ;

do while (period < cutoff(j) );

set have;

array x x:;

new=x(j);

output;

end;

keep period new;

run;

Echo77 · Posted 02-17-2014 01:21 PM

This is exactly what I needed!! Thanks a lot!!

averageuser · Posted 02-16-2014 09:37 PM

Hi,

Here is a simple macro I have written for what you are looking for.

you need to input follwing parameters.

Input dataset name, output dataset name, first column name, last column name, total columns.

%macro temp(in /* input dataset name*/

,out /* output dataset name */

,firstcol /* name of first column */

,lastcol /* name of last column */

,total /* numbe of total columns */);

data &out;

set &in;

array sel{&total} &firstcol.--&lastcol.;

do i=1 to &total;

if sel{i} ne . then do; X=sel{i}; end;

end;

drop i;

run;

%mend temp;

%temp(data1,data2,x1,x4,4); /* this is example */

Pelase let me know if it works for you.

Message was edited by: Sagar rana

Echo77 · Posted 02-17-2014 01:26 PM

Thanks for your time! But my data is a little bit more complicated than this. Some columns may contain all missing values, but I still need to take some values out of it. So the problem is really to take the values during different periods of time from different columns, not necessarily non-missing values.

UrvishShah · Posted 02-17-2014 12:58 AM

Here is my try...

data have;

input period x1 - x4;

cards4;

1 1 . . .

2 2 . . .

3 3 5 . .

4 4 6 . .

5 . 7 9 .

6 . 8 10 .

7 . . 11 .

8 . . 12 13

;;;;

data hard_code(drop = x1-x4);

set have;

array _x(4) x1 x2 x3 x4;

if period = 1 or period = 2 then do;

x = _x(1);output;

end;

else if period = 3 or period = 4 then do;

x = _x(2);output;

end;

else if period in (5,6,7) then do;

x = _x(3);output;

end;

else if period = 8 then do;

x = _x(4);output;

end;

run;

data automate(where = (x NE .));

set have;

array _x(4) x1-x4;

do i = 1 to 4;

x = _x(i);

output;

end;

run;

data automate(drop = i x1-x4);

set automate;

by period notsorted;

if last.period then output;

run;

-Urvish

Echo77 · Posted 02-17-2014 01:36 PM

Thanks! What I needed is not to get the first non-missing values from each column but to take values for pre-specified periods of time from those columns. I actually used the hard_code version...

SAS Innovate 2025: Call for Content

Classroom Training Available!