Help using Base SAS procedures

How can I combine 2 columns into one?

Reply
N/A
Posts: 1

How can I combine 2 columns into one?

I would like to combine columns into one. But i don't know how to do so. My data set is like this:


Period   ColA   ColB   ColC  

1             1

1                        2

1                                  3

2              1

2                        2

3              5

3                                  8


And i would like to combine the same column of period into one. like this:

Period    ColA    ColB    ColC

1               1          2          3

2               1          2

3               5                    8


i have tried many times but it still doesn't work.

PROC Star
Posts: 7,363

Re: How can I combine 2 columns into one?

There is probably a more efficient way of doing what you want, but the following will accomplish the task:

data have;

  input Period   ColA   ColB   ColC;

  cards;

1             1 . .

1             . 2 .

1             . . 3

2              1 . .

2              . 2 .

3              5 . .

3              . . 8

;

data firstonly;

  set have;

  by Period;

  if first.Period;

run;

data want;

  update firstonly have;

  by Period;

run;

Super User
Super User
Posts: 6,500

Re: How can I combine 2 columns into one?

You can use the UPDATE statement.  This will take the last non-missing value for each variable.  It wants a master table and an update table, but you can just use OBS=0 to have it start with an empty master table.

data have ;

input Period ColA ColB ColC @@;

cards;

1 1 . .  1 . 2 .  1 . . 3 2 1 . .  2 . 2 .  3 5 . .  3 . . 8

run;

data want ;

  update have(obs=0) have ;

  by period;

run;


data _null_;

  set want;

  put (_All_) (Smiley Happy ;

run;

1 1 2 3

2 1 2 .

3 5 . 8

PROC Star
Posts: 1,093

Re: How can I combine 2 columns into one?

And in SQL:

PROC SQL;

   CREATE TABLE WANT AS

   SELECT Period,

            (MAX(ColA)) AS ColA,

            (MAX(ColB)) AS ColB,

            (MAX(ColC)) AS ColC

      FROM HAVE

      GROUP BY Period;

QUIT;

Tom

Regular Contributor
Posts: 184

Re: How can I combine 2 columns into one?

proc summary data=have nway ;

class period ;

var col: ;

output out=want(drop = _type_ _freq_) mean= ;

run ;

PROC Star
Posts: 7,363

Re: How can I combine 2 columns into one?

: Definitely an interesting way to do it but, from my test, Tom's method runs almost twice as fast.  I ran:

/* Create some sample data */

data have (drop=_Smiley Happy;

  input _Period   ColA   ColB   ColC;

  do Period=_Period to _Period+999;

    output;

  end;

  cards;

1             1 . .

1             . 2 .

1             . . 3

1001              1 . .

1001              . 2 .

2001              5 . .

2001              . . 8

3001          1 . .

3001          . 2 .

3001           . . 3

4001              1 . .

4001              . 2 .

5001              5 . .

5001              . . 8

6001              1 . .

6001              . 2 .

7001              5 . .

7001              . . 8

8001          1 . .

8001          . 2 .

8001           . . 3

9001              1 . .

9001              . 2 .

10001              5 . .

10001              . . 8

;

proc sort data=have;

  by period;

run;

/*Howard's suggested code*/

proc summary data=have nway ;

  class period ;

  var col: ;

  output out=want(drop = _type_ _freq_) mean= ;

run ;

/*Tom's suggested code*/

data want ;

  update have(obs=0) have ;

  by period;

run;

Contributor
Posts: 29

Re: How can I combine 2 columns into one?

This should run as quickly as Tom's SQL, I hope (although it is more verbose!):

data lastonly (drop=t_Smiley Happy;

  set have (rename=(ColA=t_A ColB=t_B ColC=t_C));

  by Period;

  length ColA ColB ColC 8.;

  retain ColA ColB ColC;

  if first.Period then do;

    ColA = .;

    ColB = .;

    ColC = .;

  end;

  ColA = max(ColA,t_A);

  ColB = max(ColB,t_B);

  ColC = max(ColC,t_C);

  if last.Period then output;

run;

PROC Star
Posts: 1,093

Re: How can I combine 2 columns into one?

I love Tom's solution (never seen that pattern before), but I'm also a big fan of PROC SUMMARY/MEANS. I was wondering if procedure startup code explained the time differences, so I ran some tests at higher volumes. Note that the times for "Tom" include a sort, as it is required for his pattern, whereas the SUMMARY doesn't need to be sorted. The times were:

Summary Tables

HowardTom
volumeReal TimeCPU TimeReal TimeCPU Time
25,000 observations0:00.470:00.210:00.030:00.02
250,000 observations0:00.380:00.510:00.160:00.28
2,500,000 observations0:05.540:07.170:01.710:02.84
25,000,000 observations2:46.002:06.001:23.000:41.00


Generated by the SAS System ('Local', X64_7PRO) on
November 26, 2012 at 4:26:14 PM

And as a chart:

img0.jpg

Howver, the 25 million row with SUMMARY run required disk work space, so I reran it with increased memsize. Here's the result:

img1.jpg

Note the vertical scale is log.

So:

Tom's technique is as efficient as it gets;

PROC SUMMARY will somewhat catch up at higher volumes, but requires more memory;

I just blew a whole afternoon, and enjoyed every bit of it!

Tom

Super User
Super User
Posts: 6,500

Re: How can I combine 2 columns into one?

It depends on what answer you want.  Using the UPDATE statement will take the last non-missing value.  Calculating a statistic such MEAN, MIN or MAX could get a different answer when there are multiple non-missing values of a variable within the same by group.

Contributor
Posts: 45

Re: How can I combine 2 columns into one?

Here is another approach, although not quite sure how efficient that would be:

data _null_;

      if(1=2) then set have;

      declare hash cols(ordered:"a");

      cols.defineKey("period");

      cols.defineData("period","colA", "colB", "colC");

      cols.defineDone();

      do until(done);

            set have end=done;

            x1=colA; x2=colB; x3=colC;

            rc1=cols.find();

            y1=max(x1, colA); y2=max(x2, colB); y3=max(x3, colC);

            rc2=cols.replace(keySmiley Tongueeriod,dataSmiley Tongueeriod,data:y1,data:y2,data:y3);

      end;

      cols.output(dataset:"want");

      stop;

run;

RESULTS:

  

  Period ColA ColB ColC
1 1 1 2 3
2 2 1 2 .
3 3 5 . 8

PROC Star
Posts: 7,363

Re: How can I combine 2 columns into one?

: cpu time slowest of the three suggested approaches on 11,000 records.

Tom: .03 seconds

Howard: .07 seconds

Joe: .12 sesconds

Contributor
Posts: 45

Re: How can I combine 2 columns into one?

Wow! Many thanks, Tom.

Will keep that in mind.

Kind regards,

Joe

Contributor
Posts: 29

Re: How can I combine 2 columns into one?

Actually, I suggested another approach...but I don't have time to time it Smiley Wink

PROC Star
Posts: 7,363

Re: How can I combine 2 columns into one?

: Sorry for not including your approach in the times that I sent earlier:

Tom: .03 seconds

Tim: .06 seconds

Howard: .07 seconds

Joe: .12 sesconds

Ask a Question
Discussion stats
  • 13 replies
  • 408 views
  • 0 likes
  • 7 in conversation