BookmarkSubscribeRSS Feed
King_235
Calcite | Level 5

I would like to combine columns into one. But i don't know how to do so. My data set is like this:


Period   ColA   ColB   ColC  

1             1

1                        2

1                                  3

2              1

2                        2

3              5

3                                  8


And i would like to combine the same column of period into one. like this:

Period    ColA    ColB    ColC

1               1          2          3

2               1          2

3               5                    8


i have tried many times but it still doesn't work.

13 REPLIES 13
art297
Opal | Level 21

There is probably a more efficient way of doing what you want, but the following will accomplish the task:

data have;

  input Period   ColA   ColB   ColC;

  cards;

1             1 . .

1             . 2 .

1             . . 3

2              1 . .

2              . 2 .

3              5 . .

3              . . 8

;

data firstonly;

  set have;

  by Period;

  if first.Period;

run;

data want;

  update firstonly have;

  by Period;

run;

Tom
Super User Tom
Super User

You can use the UPDATE statement.  This will take the last non-missing value for each variable.  It wants a master table and an update table, but you can just use OBS=0 to have it start with an empty master table.

data have ;

input Period ColA ColB ColC @@;

cards;

1 1 . .  1 . 2 .  1 . . 3 2 1 . .  2 . 2 .  3 5 . .  3 . . 8

run;

data want ;

  update have(obs=0) have ;

  by period;

run;


data _null_;

  set want;

  put (_All_) (:) ;

run;

1 1 2 3

2 1 2 .

3 5 . 8

TomKari
Onyx | Level 15

And in SQL:

PROC SQL;

   CREATE TABLE WANT AS

   SELECT Period,

            (MAX(ColA)) AS ColA,

            (MAX(ColB)) AS ColB,

            (MAX(ColC)) AS ColC

      FROM HAVE

      GROUP BY Period;

QUIT;

Tom

Howles
Quartz | Level 8

proc summary data=have nway ;

class period ;

var col: ;

output out=want(drop = _type_ _freq_) mean= ;

run ;

art297
Opal | Level 21

: Definitely an interesting way to do it but, from my test, Tom's method runs almost twice as fast.  I ran:

/* Create some sample data */

data have (drop=_:);

  input _Period   ColA   ColB   ColC;

  do Period=_Period to _Period+999;

    output;

  end;

  cards;

1             1 . .

1             . 2 .

1             . . 3

1001              1 . .

1001              . 2 .

2001              5 . .

2001              . . 8

3001          1 . .

3001          . 2 .

3001           . . 3

4001              1 . .

4001              . 2 .

5001              5 . .

5001              . . 8

6001              1 . .

6001              . 2 .

7001              5 . .

7001              . . 8

8001          1 . .

8001          . 2 .

8001           . . 3

9001              1 . .

9001              . 2 .

10001              5 . .

10001              . . 8

;

proc sort data=have;

  by period;

run;

/*Howard's suggested code*/

proc summary data=have nway ;

  class period ;

  var col: ;

  output out=want(drop = _type_ _freq_) mean= ;

run ;

/*Tom's suggested code*/

data want ;

  update have(obs=0) have ;

  by period;

run;

TimArm
Obsidian | Level 7

This should run as quickly as Tom's SQL, I hope (although it is more verbose!):

data lastonly (drop=t_:);

  set have (rename=(ColA=t_A ColB=t_B ColC=t_C));

  by Period;

  length ColA ColB ColC 8.;

  retain ColA ColB ColC;

  if first.Period then do;

    ColA = .;

    ColB = .;

    ColC = .;

  end;

  ColA = max(ColA,t_A);

  ColB = max(ColB,t_B);

  ColC = max(ColC,t_C);

  if last.Period then output;

run;

TomKari
Onyx | Level 15

I love Tom's solution (never seen that pattern before), but I'm also a big fan of PROC SUMMARY/MEANS. I was wondering if procedure startup code explained the time differences, so I ran some tests at higher volumes. Note that the times for "Tom" include a sort, as it is required for his pattern, whereas the SUMMARY doesn't need to be sorted. The times were:

Summary Tables

HowardTom
volumeReal TimeCPU TimeReal TimeCPU Time
25,000 observations0:00.470:00.210:00.030:00.02
250,000 observations0:00.380:00.510:00.160:00.28
2,500,000 observations0:05.540:07.170:01.710:02.84
25,000,000 observations2:46.002:06.001:23.000:41.00


Generated by the SAS System ('Local', X64_7PRO) on
November 26, 2012 at 4:26:14 PM

And as a chart:

img0.jpg

Howver, the 25 million row with SUMMARY run required disk work space, so I reran it with increased memsize. Here's the result:

img1.jpg

Note the vertical scale is log.

So:

Tom's technique is as efficient as it gets;

PROC SUMMARY will somewhat catch up at higher volumes, but requires more memory;

I just blew a whole afternoon, and enjoyed every bit of it!

Tom

Tom
Super User Tom
Super User

It depends on what answer you want.  Using the UPDATE statement will take the last non-missing value.  Calculating a statistic such MEAN, MIN or MAX could get a different answer when there are multiple non-missing values of a variable within the same by group.

joehinson
Calcite | Level 5

Here is another approach, although not quite sure how efficient that would be:

data _null_;

      if(1=2) then set have;

      declare hash cols(ordered:"a");

      cols.defineKey("period");

      cols.defineData("period","colA", "colB", "colC");

      cols.defineDone();

      do until(done);

            set have end=done;

            x1=colA; x2=colB; x3=colC;

            rc1=cols.find();

            y1=max(x1, colA); y2=max(x2, colB); y3=max(x3, colC);

            rc2=cols.replace(key:period,data:period,data:y1,data:y2,data:y3);

      end;

      cols.output(dataset:"want");

      stop;

run;

RESULTS:

  

  Period ColA ColB ColC
1 1 1 2 3
2 2 1 2 .
3 3 5 . 8

art297
Opal | Level 21

: cpu time slowest of the three suggested approaches on 11,000 records.

Tom: .03 seconds

Howard: .07 seconds

Joe: .12 sesconds

joehinson
Calcite | Level 5

Wow! Many thanks, Tom.

Will keep that in mind.

Kind regards,

Joe

TimArm
Obsidian | Level 7

Actually, I suggested another approach...but I don't have time to time it Smiley Wink

art297
Opal | Level 21

: Sorry for not including your approach in the times that I sent earlier:

Tom: .03 seconds

Tim: .06 seconds

Howard: .07 seconds

Joe: .12 sesconds

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1486 views
  • 0 likes
  • 7 in conversation