Solved: Re: Collapse multiple rows into one row based on a category

Alireza_Boloori · Posted 07-18-2019 06:56 PM

Hello everyone,

I want to collapse all rows (for each unique ID) into one row. Here is what I have:

data have;
input id x1 x2 x3 x4 x5;
datalines;
1 347 418 213 . .
1 213 . . . .
1 057 912 213 329 .
2 521 543 333 . .
2 333 521 555 . .
3 817 . . . .
;

And here is what I want:

data want;
input id y1 y2 y3 y4 y5 y6;
datalines;
1 347 418 213 057 912 329
2 521 543 333 555 817 .
;

The only other thing that I should mention is that I want to represent duplicated values for each ID only once (e.g., 213 for ID=1 and 333 and 521 for ID=2). Any help or idea is appreciated!

PGStats · Posted 07-18-2019 11:54 PM

OK, so you haven't tried anything. Maybe you should start with simpler problems.

Here is a data step solution to this problem that involves arrays, the do until pattern and the whichn function:

data want;
array y{99};
j = 0;
do until(last.id);
    set have; by id;
    array xx x:;
    do i = 1 to dim(xx);
        if missing(xx{i}) then leave;
        else do;
            if whichn(xx{i}, of y{*}) = 0 then do;
                j = j + 1;
                y{j} = xx{i};
                end;
            end;
        end;
    end;
output;
drop i j x:;
run;

with the hope that you will not use it before you understand how it works .

PG

View solution in original post

PGStats · Posted 07-18-2019 08:15 PM

Let's start with the code you tried so far...

PG

Alireza_Boloori · Posted 07-18-2019 08:21 PM

@PGStats If I only had one variable (x1), proc transpose would be a viable option, but since there are multiple variables, proc transpose would not return what I want.

Shmuel · Posted 07-18-2019 11:46 PM

You can do:

1) proc transpose from wide to short, so you have one vale in each row

2) sort by ID & value with optioon NODUPKEY

3) proc transpose from short to wide

PGStats · Posted 07-18-2019 11:54 PM

OK, so you haven't tried anything. Maybe you should start with simpler problems.

Here is a data step solution to this problem that involves arrays, the do until pattern and the whichn function:

data want;
array y{99};
j = 0;
do until(last.id);
    set have; by id;
    array xx x:;
    do i = 1 to dim(xx);
        if missing(xx{i}) then leave;
        else do;
            if whichn(xx{i}, of y{*}) = 0 then do;
                j = j + 1;
                y{j} = xx{i};
                end;
            end;
        end;
    end;
output;
drop i j x:;
run;

with the hope that you will not use it before you understand how it works .

PG

VDD · Posted 07-18-2019 11:56 PM

https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

KachiM · Posted 07-19-2019 02:23 AM

I am complementing to PGStats solution.

The array (YY) that collects the distinct Xs is sized to 99 as the exact size is unknown and that you may not have more than this number. The larger size may crowd the Program Data Vector. I am showing the use of _temporary_ array which can be sized even to millions. I just used 1000 as the size. This is done by scanning the HAVE to count Distinct Values within ID and getting the maximum of counts from several IDs.

/** Maximum of the Distinct Values within ID **/
data _null_;
   do i = 1 by 1 until(last.id);
      set have end = eof;
      by id notsorted;
      array yy[1000] _temporary_;
      array x x:;
      if first.id then do; j = 1; call missing(of yy[*]); end;
      do i = 1 to dim(x);
         if x[i] then do;
            v = whichn(x[i], of yy[*]);
            if v = 0 then do; yy[j] = x[i]; j+1; end;
         end;
      end;
   end;
   retain max;
   max = max(max, n(of yy[*]));
   if eof then call symputx('max', max);
run;

Then we can use &max to size the array as in:

data want;
   do until(last.id);
      set have;
      by id notsorted;
      array x x:;
      array y y1 - y&max;
      if first.id then do; j = 1; call missing(of y[*]); end;
      retain y;
      do i = 1 to dim(x);
         if x[i] then do;
            v = whichN(x[i], of y[*]);
            if v = 0 then do; y[j] = x[i]; j+1; end;
         end;
      end;
   end;
keep id y:;
run;

Alireza_Boloori · Posted 07-19-2019 06:09 AM

@PGStats Thank you very much for the solution!

@KachiM Thanks! The solution that @PGStats provided works for the example that I provided. Your solution works perfectly as well (especially for large-size data).

@VDD and @Shmuel Thanks for your inputs.

Registration is open

SAS Training: Just a Click Away