Hello everyone,
I want to collapse all rows (for each unique ID) into one row. Here is what I have:
data have;
input id x1 x2 x3 x4 x5;
datalines;
1 347 418 213 . .
1 213 . . . .
1 057 912 213 329 .
2 521 543 333 . .
2 333 521 555 . .
3 817 . . . .
;
And here is what I want:
data want;
input id y1 y2 y3 y4 y5 y6;
datalines;
1 347 418 213 057 912 329
2 521 543 333 555 817 .
;
The only other thing that I should mention is that I want to represent duplicated values for each ID only once (e.g., 213 for ID=1 and 333 and 521 for ID=2). Any help or idea is appreciated!
OK, so you haven't tried anything. Maybe you should start with simpler problems.
Here is a data step solution to this problem that involves arrays, the do until pattern and the whichn function:
data want;
array y{99};
j = 0;
do until(last.id);
set have; by id;
array xx x:;
do i = 1 to dim(xx);
if missing(xx{i}) then leave;
else do;
if whichn(xx{i}, of y{*}) = 0 then do;
j = j + 1;
y{j} = xx{i};
end;
end;
end;
end;
output;
drop i j x:;
run;
with the hope that you will not use it before you understand how it works .
Let's start with the code you tried so far...
@PGStats If I only had one variable (x1), proc transpose would be a viable option, but since there are multiple variables, proc transpose would not return what I want.
You can do:
1) proc transpose from wide to short, so you have one vale in each row
2) sort by ID & value with optioon NODUPKEY
3) proc transpose from short to wide
OK, so you haven't tried anything. Maybe you should start with simpler problems.
Here is a data step solution to this problem that involves arrays, the do until pattern and the whichn function:
data want;
array y{99};
j = 0;
do until(last.id);
set have; by id;
array xx x:;
do i = 1 to dim(xx);
if missing(xx{i}) then leave;
else do;
if whichn(xx{i}, of y{*}) = 0 then do;
j = j + 1;
y{j} = xx{i};
end;
end;
end;
end;
output;
drop i j x:;
run;
with the hope that you will not use it before you understand how it works .
I am complementing to PGStats solution.
The array (YY) that collects the distinct Xs is sized to 99 as the exact size is unknown and that you may not have more than this number. The larger size may crowd the Program Data Vector. I am showing the use of _temporary_ array which can be sized even to millions. I just used 1000 as the size. This is done by scanning the HAVE to count Distinct Values within ID and getting the maximum of counts from several IDs.
/** Maximum of the Distinct Values within ID **/ data _null_; do i = 1 by 1 until(last.id); set have end = eof; by id notsorted; array yy[1000] _temporary_; array x x:; if first.id then do; j = 1; call missing(of yy[*]); end; do i = 1 to dim(x); if x[i] then do; v = whichn(x[i], of yy[*]); if v = 0 then do; yy[j] = x[i]; j+1; end; end; end; end; retain max; max = max(max, n(of yy[*])); if eof then call symputx('max', max); run;
Then we can use &max to size the array as in:
data want; do until(last.id); set have; by id notsorted; array x x:; array y y1 - y&max; if first.id then do; j = 1; call missing(of y[*]); end; retain y; do i = 1 to dim(x); if x[i] then do; v = whichN(x[i], of y[*]); if v = 0 then do; y[j] = x[i]; j+1; end; end; end; end; keep id y:; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.