BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alireza_Boloori
Fluorite | Level 6

Hello everyone,

 

I want to collapse all rows (for each unique ID) into one row. Here is what I have:

data have;
input id x1 x2 x3 x4 x5;
datalines;
1 347 418 213 . .
1 213 . . . .
1 057 912 213 329 .
2 521 543 333 . .
2 333 521 555 . .
3 817 . . . .
;

And here is what I want:

data want;
input id y1 y2 y3 y4 y5 y6;
datalines;
1 347 418 213 057 912 329
2 521 543 333 555 817 .
;

The only other thing that I should mention is that I want to represent duplicated values for each ID only once (e.g., 213 for ID=1 and 333 and 521 for ID=2). Any help or idea is appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

OK, so you haven't tried anything. Maybe you should start with simpler problems.

Here is a data step solution to this problem that involves arrays, the do until pattern and the whichn function:

 

data want;
array y{99};
j = 0;
do until(last.id);
    set have; by id;
    array xx x:;
    do i = 1 to dim(xx);
        if missing(xx{i}) then leave;
        else do;
            if whichn(xx{i}, of y{*}) = 0 then do;
                j = j + 1;
                y{j} = xx{i};
                end;
            end;
        end;
    end;
output;
drop i j x:;
run;

with the hope that you will not use it before you understand how it works Smiley Wink.

PG

View solution in original post

7 REPLIES 7
PGStats
Opal | Level 21

Let's start with the code you tried so far...

PG
Alireza_Boloori
Fluorite | Level 6

@PGStats  If I only had one variable (x1), proc transpose would be a viable option, but since there are multiple variables, proc transpose would not return what I want.

Shmuel
Garnet | Level 18

You can do:

1) proc transpose from wide to short, so you have one vale in each row

2) sort by ID & value with optioon NODUPKEY

3) proc transpose from short to wide

PGStats
Opal | Level 21

OK, so you haven't tried anything. Maybe you should start with simpler problems.

Here is a data step solution to this problem that involves arrays, the do until pattern and the whichn function:

 

data want;
array y{99};
j = 0;
do until(last.id);
    set have; by id;
    array xx x:;
    do i = 1 to dim(xx);
        if missing(xx{i}) then leave;
        else do;
            if whichn(xx{i}, of y{*}) = 0 then do;
                j = j + 1;
                y{j} = xx{i};
                end;
            end;
        end;
    end;
output;
drop i j x:;
run;

with the hope that you will not use it before you understand how it works Smiley Wink.

PG
KachiM
Rhodochrosite | Level 12

I am complementing to  PGStats solution.

 

The array (YY)  that collects the distinct Xs is sized to 99 as the exact size is unknown and that you may not have more than this number. The larger size may crowd the Program Data Vector. I am showing the use of _temporary_ array which can be sized even to millions. I just used 1000 as the size. This is done by scanning the HAVE to count Distinct Values within ID and getting the maximum of counts from several IDs.

 

/** Maximum of the Distinct Values within ID **/
data _null_;
   do i = 1 by 1 until(last.id);
      set have end = eof;
      by id notsorted;
      array yy[1000] _temporary_;
      array x x:;
      if first.id then do; j = 1; call missing(of yy[*]); end;
      do i = 1 to dim(x);
         if x[i] then do;
            v = whichn(x[i], of yy[*]);
            if v = 0 then do; yy[j] = x[i]; j+1; end;
         end;
      end;
   end;
   retain max;
   max = max(max, n(of yy[*]));
   if eof then call symputx('max', max);
run;

Then we can use &max to size the array as in:

 

data want;
   do until(last.id);
      set have;
      by id notsorted;
      array x x:;
      array y y1 - y&max;
      if first.id then do; j = 1; call missing(of y[*]); end;
      retain y;
      do i = 1 to dim(x);
         if x[i] then do;
            v = whichN(x[i], of y[*]);
            if v = 0 then do; y[j] = x[i]; j+1; end;
         end;
      end;
   end;
keep id y:;
run;
Alireza_Boloori
Fluorite | Level 6

@PGStats Thank you very much for the solution!

@KachiM Thanks! The solution that @PGStats provided works for the example that I provided. Your solution works perfectly as well (especially for large-size data).

@VDD and @Shmuel Thanks for your inputs.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3545 views
  • 5 likes
  • 5 in conversation