BookmarkSubscribeRSS Feed
steiner28
Calcite | Level 5

I am trying to combine observations by ID, using 12 different variables. I am checking for duplicate ID's and trying to combine them into one observation. Below is the code I am currently using, but every time I go to the dataset, one of the observations takes over, and the others seem to disappear. 

 

proc sort data = test;
by ID;
run;
data have;
update dup (obs=0) dup;
by ID;
run;


proc sql;
create table want as
select PDIAG10, SDIAG10_1, SDIAG10_2,SDIAG10_3,SDIAG10_4,SDIAG10_5,SDIAG10_6,SDIAG10_7,SDIAG10_8,SDIAG10_9,SDIAG10_10, SDIAG10_11, VISIT_ID, AGE, DOS_MY, hTYPE, ER, gender, ID, RACE, Married
from have
group by ID;
quit;

 

one.jpgtwo.jpg

8 REPLIES 8
Tom
Super User Tom
Super User

Please show an example of what you are talking about.

Post (as TEXT not PHOTOGRAPHS) a few observations for the same ID with two or three other variables.

Show what you want as the output once the observations (which I assume is what you meant by rows) are "combined".  Again show the result as TEXT.

steiner28
Calcite | Level 5
Here is an example of what I mean

What I have:
ColA Col1 Col2 Col3 Col4 Col4 Col5 Col6 ... Col11 ID
Z123 1
H456 E125 T158 1
Z123 T158 E125 1

What I want
ColA Col1 Col2 Col3 Col4 Col4 Col5 Col6 ... Col11 ID
Z123 H456 E125 T158 1
steiner28
Calcite | Level 5
What I have:
ColA Col1 Col2 Col3 Col4 Col4 Col5 Col6 ... Col11 ID Z123 1 H456 E125 T158 1 Z123 T158 E125 1 What I want: ColA Col1 Col2 Col3 Col4 Col4 Col5 Col6 ... Col11 ID Z123 H456 E125 T158 1

Sorry about the previous reply! I haven't used this before.

Tom
Super User Tom
Super User

If kind of looks like that instead picking just one of the values of a variable, say COL1, you want to create a different type of variable that can hold multiple values?

mkeintz
PROC Star

You have not stated your rules for constructing the new values of COLA, COL1, .....

 

  1. It appears you want to output 1 record per id, correct?
  2. The new COLA appears to be a space-separated list of the unique values that appear in the old COLA series, in the order that those unique values occur.
  3. The new COL1 and new COL2  (and COL3-COL11?) variables appear to have whatever was the first non-missing value for those variables in the old dataset.  Is that correct?

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

If the answers to my questions are all "yes", then this program would produce the results you describe:

 

data want (drop=_:);
  length cola $34 ;
  retain cola;
  set have (keep=rename=(cola=_old_cola));
  by id;
  if findw(cola,trim(_old_cola),' ')=0 then cola=catx(' ',cola,_old_cola);

  array col     col1-col11;
  array _tmp $4 _tmp1-_tmp11;
  do over _tmp;  _tmp=coalescec(_tmp,col); end;

  if last.id;
  do over col;   col=_tmp;                 end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
steiner28
Calcite | Level 5
The answer to your first question is yes!
2. I need all of the columns, COLA-COL11 to be space separated that only store unique values.
3. For COLA-COL11, I just need no duplicates across the row.

Long-term I am calculating the CCI for data, and I need each ID (i.e person) to only be counted once.
mkeintz
PROC Star

Let's clarify a little more.  Your requirements are still not entirely clear to me.  I now think you want

  1. List all unique values, from the old COLA, in the order encountered, in the new COLA.
  2. List all unique values, from the old COL1, and not found in the old COLA, in the new COL1.
  3. List all unique values, from the old COL2, and not found in the old COLA or old COL1, in the new COL2.

 

In general, each new COL will have unique values found in that COL, but not found in the COLs to its left.

 

Is the above correct?

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2481 views
  • 0 likes
  • 3 in conversation