How to Combine Rows by ID to a single row, without data disappearing?

steiner28 · Posted 03-28-2022 12:16 PM

I am trying to combine observations by ID, using 12 different variables. I am checking for duplicate ID's and trying to combine them into one observation. Below is the code I am currently using, but every time I go to the dataset, one of the observations takes over, and the others seem to disappear.

proc sort data = test;
by ID;
run;
data have;
update dup (obs=0) dup;
by ID;
run;

proc sql;
create table want as
select PDIAG10, SDIAG10_1, SDIAG10_2,SDIAG10_3,SDIAG10_4,SDIAG10_5,SDIAG10_6,SDIAG10_7,SDIAG10_8,SDIAG10_9,SDIAG10_10, SDIAG10_11, VISIT_ID, AGE, DOS_MY, hTYPE, ER, gender, ID, RACE, Married
from have
group by ID;
quit;

Tom · Posted 03-28-2022 12:22 PM

Please show an example of what you are talking about.

Post (as TEXT not PHOTOGRAPHS) a few observations for the same ID with two or three other variables.

Show what you want as the output once the observations (which I assume is what you meant by rows) are "combined". Again show the result as TEXT.

steiner28 · Posted 03-28-2022 12:43 PM

Here is an example of what I mean

What I have:
ColA Col1 Col2 Col3 Col4 Col4 Col5 Col6 ... Col11 ID
Z123 1
H456 E125 T158 1
Z123 T158 E125 1

What I want
ColA Col1 Col2 Col3 Col4 Col4 Col5 Col6 ... Col11 ID
Z123 H456 E125 T158 1

steiner28 · Posted 03-28-2022 12:46 PM

What I have:
ColA Col1 Col2 Col3 Col4 Col4 Col5 Col6 ... Col11 ID
Z123                                              1
H456 E125 T158                                    1
Z123 T158 E125                                    1

What I want:
ColA      Col1   Col2 Col3 Col4 Col4 Col5 Col6 ... Col11 ID
Z123 H456 E125   T158                                    1

Sorry about the previous reply! I haven't used this before.

Tom · Posted 03-28-2022 01:03 PM

If kind of looks like that instead picking just one of the values of a variable, say COL1, you want to create a different type of variable that can hold multiple values?

mkeintz · Posted 03-28-2022 05:37 PM

You have not stated your rules for constructing the new values of COLA, COL1, .....

It appears you want to output 1 record per id, correct?
The new COLA appears to be a space-separated list of the unique values that appear in the old COLA series, in the order that those unique values occur.
The new COL1 and new COL2 (and COL3-COL11?) variables appear to have whatever was the first non-missing value for those variables in the old dataset. Is that correct?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 03-28-2022 05:53 PM

If the answers to my questions are all "yes", then this program would produce the results you describe:

data want (drop=_:);
  length cola $34 ;
  retain cola;
  set have (keep=rename=(cola=_old_cola));
  by id;
  if findw(cola,trim(_old_cola),' ')=0 then cola=catx(' ',cola,_old_cola);

  array col     col1-col11;
  array _tmp $4 _tmp1-_tmp11;
  do over _tmp;  _tmp=coalescec(_tmp,col); end;

  if last.id;
  do over col;   col=_tmp;                 end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

steiner28 · Posted 03-28-2022 05:59 PM

The answer to your first question is yes!
2. I need all of the columns, COLA-COL11 to be space separated that only store unique values.
3. For COLA-COL11, I just need no duplicates across the row.

Long-term I am calculating the CCI for data, and I need each ID (i.e person) to only be counted once.

mkeintz · Posted 03-29-2022 01:23 AM

Let's clarify a little more. Your requirements are still not entirely clear to me. I now think you want

List all unique values, from the old COLA, in the order encountered, in the new COLA.
List all unique values, from the old COL1, and not found in the old COLA, in the new COL1.
List all unique values, from the old COL2, and not found in the old COLA or old COL1, in the new COL2.

In general, each new COL will have unique values found in that COL, but not found in the COLs to its left.

Is the above correct?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

How to Combine Rows by ID to a single row, without data disappearing?

Re: How to Combine Rows by ID to a single row, without data disappearing?

Re: How to Combine Rows by ID to a single row, without data disappearing?

Re: How to Combine Rows by ID to a single row, without data disappearing?

Re: How to Combine Rows by ID to a single row, without data disappearing?

Re: How to Combine Rows by ID to a single row, without data disappearing?

Re: How to Combine Rows by ID to a single row, without data disappearing?

Re: How to Combine Rows by ID to a single row, without data disappearing?

Re: How to Combine Rows by ID to a single row, without data disappearing?

SAS Innovate 2025: Call for Content