BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Echo77
Calcite | Level 5

My data looks like this:

Period  x1  x2  x3  x4

1         1    .     .    .

2         2    .     .     .

3         3    5    .    .

4         4    6    .    .

5         .     7    9    .

6         .     8   10  .

7         .     .    11  .

8         .     .    12  13

Any idea how to combine into to the following form? Thanks for your help!

period  x

1         1

2         2

3         5

4         6

5         9

6         10

7         11

8         13

A better way to rephrase my questions is that:I have 5 columns(variables) in my dataset and 8 rows(observations). What I wanted to do is to select the first 2 observations from column x1, the 2-4th from column x2, the 5th-7th in column x3, and 8th in column x4 and combine those values into one column x.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Or if instead you want to base when to switch on the value of the PERIOD variable you might do it this way.

data want2 ;

  array cutoff (4) _temporary_ (2,4,7,10000);

  do j=1 to 4 ;

    do while (period < cutoff(j) );

      set have;

      array x x:;

      new=x(j);

      output;

    end;

  end;

  keep period new;

run;

View solution in original post

13 REPLIES 13
overmar
Obsidian | Level 7

It looks like what you are asking for is a new dataset with period, and the maximum of all of the values for each period. The below code shows two different ways to do it.

proc sql;

create table want as

select distinct period, (max(x1,x2,x3,x4)) as x

from have

group by period;

quit;

OR

data want;

     set have;

     array xs (*) x1-x4;

     x = max(of xs(*));

     keep period x;

run;

Hope this helps

Echo77
Calcite | Level 5

Thanks a lot for your quick response!! This is not exactly what I wanted though. Those numbers are a little bit misleading..I need to revise those number a little bit.

The original data looks like this:

Period  x1     x2    x3     x4

1         a1       .       .         .

2         a2       .       .         .

3         a3      a5    .          .

4         a4      a6    .          .

5         .         a7   a9         .

6         .         a8   a10      .

7         .         .      a11       .

8         .         .      a12  a13

The one I wanted:

period  x

1        a1

2        a2

3        a5

4         a6

5         a9

6         a10

7         a11

8         a13

So those are just random numbers, no definite mathematical relationships.  Suppose their are two columns, both with non-missing values for period i, I want to always take the value for period i from the column with a larger column number.  In the example, I have non-missing value for period 3 in both column 2 and column 3, I'll take the value from column 3 which is a5.

Any idea? Thanks!

overmar
Obsidian | Level 7

This should get you what you are looking for, the by -1 is essential as it then counts the columns backwards and the while will stop the do loop if there was a value found in the last iteration. I don't think it should give you an array out of range if you do not have any values in x1-x4, but if it does you may want to add this code within the do loop as this should solve the problem. /*if i = 1 and xs(1) = . then x = .;*/

Oh and if your variables are characters replace all of the . with "" in the code.

data want;

    set have;

    x = .;

    array xs (4) x1-x4;

    do i = 4 to 1 by -1 while (x = .);;

    if xs(i) ~=. then x = xs(i);

    end;

    keep period x;

run;

Echo77
Calcite | Level 5

This does solve the problem I listed. But I just realize that my data is more complicated than the simplified version...Smiley SadThanks a lot though!!

overmar
Obsidian | Level 7

How much more complicated? My guess is the answer is only the difference in a couple lines of code.

Echo77
Calcite | Level 5

I have 36 columns and 3600 rows and some columns contains all missing values. After I posted the response to your answer, I realize that was not the best way to express my question. The question really was that  I'd love to select some rows from those columns and then combine them into one column. And I know exactly which ones I need from each column.

In the example, I want to select the first 2 observations from column x1, the 2-4th from column x2, the 5th-7th in column x3, and 8th in column x4 and combine those values into one column x.


I can use multiple (if _n_>a and _n_<b, then x=xi)statements to do this, but I'm wondering if there is a more elegant way to do this?

Tom
Super User Tom
Super User

So if you know the number of values you want from each column you could do it this way.

data have ;

input Period x1-x4 @@ ;

cards4;

1 1 . . . 2 2 . . . 3 3 5 . . 4 4 6 . . 5 . 7 9 .

6 . 8 10 . 7 . . 11 . 8 . . 12 13

;;;;

data want ;

  do n=2,2,3,100000;

    j+1;

    do n=n to 1 by -1 ;

      set have;

      array x x:;

      new=x(j);

      output;

    end;

  end;

  keep period new;

run;

Tom
Super User Tom
Super User

Or if instead you want to base when to switch on the value of the PERIOD variable you might do it this way.

data want2 ;

  array cutoff (4) _temporary_ (2,4,7,10000);

  do j=1 to 4 ;

    do while (period < cutoff(j) );

      set have;

      array x x:;

      new=x(j);

      output;

    end;

  end;

  keep period new;

run;

Echo77
Calcite | Level 5

This is exactly what I needed!! Thanks a lot!!

averageuser
Calcite | Level 5

Hi,

Here is a simple macro I have written for what you are looking for.

you need to input follwing parameters.

Input dataset name, output dataset name, first column name, last column name, total columns.

%macro temp(in /* input dataset name*/

                     ,out /* output dataset name */

                     ,firstcol /* name of first column */

                    ,lastcol /* name of last column */

                    ,total  /* numbe of total columns */);

data &out;

  set &in;

  array sel{&total} &firstcol.--&lastcol.;

  do i=1 to &total;

  if sel{i} ne . then do; X=sel{i}; end;

    end;

    drop i;

run;

%mend temp;

%temp(data1,data2,x1,x4,4);  /* this is example */

Pelase let me know if it works for you. Smiley Happy

Message was edited by: Sagar rana

Echo77
Calcite | Level 5

Thanks for your time! But my data is a little bit more complicated than this. Some columns may contain all missing values, but I still need to take some values out of it. So the problem is really to take the values during different periods of time from different columns, not necessarily non-missing values.

UrvishShah
Fluorite | Level 6

Here is my try...

data have;

   input period x1 - x4;

   cards4;

1         1    .     .    .

2         2    .     .     .

3         3    5    .    .

4         4    6    .    .

5         .     7    9    .

6         .     8   10  .

7         .     .    11  .

8         .     .    12  13

;;;;

data hard_code(drop = x1-x4);

  set have;

  array _x(4) x1 x2 x3 x4;

  if period = 1 or period = 2 then do;

     x = _x(1);output;

  end;

  else if period = 3 or period = 4 then do;

     x = _x(2);output;

  end;

  else if period in (5,6,7) then do;

     x = _x(3);output;

  end;

  else if period = 8 then do;

     x = _x(4);output;

  end;

run;

data automate(where = (x NE .));

  set have;

  array _x(4) x1-x4;

  do i = 1 to 4;

     x = _x(i);

      output;

  end;

run;

data automate(drop = i x1-x4);

  set automate;

  by period notsorted;

  if last.period then output;

run;

-Urvish

Echo77
Calcite | Level 5

Thanks! What I needed is not to get the first non-missing values from each column but to take values for pre-specified periods of time from those columns. I actually used the hard_code version...

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 7827 views
  • 7 likes
  • 5 in conversation