Help using Base SAS procedures

SAS NOVICE NEED TO DEDUPLICATE!

Reply
N/A
Posts: 0

SAS NOVICE NEED TO DEDUPLICATE!

Hi there,

I am a 110% novice in SAS and need to merge two big datasets. My problem is that I have multiple observations in multiple tables that use the same variable. It is unfortunately the only variable that all the tables share. When I merge them initially I get of course, duplicate observations, many with much of the same information but 1 or 2 variables have different information.
Would I create a dataset using a new variable to deal with this? In the end, I'd like to just have 1 'ID' per person, not multiple records per person.

To be more precise, going off of another's post, my data looks somewhat like this:


StudyID
1356 Comment1
1356 Comment2
1356 Comment3
1444 Comment1
1444 Comment2
1667 Comment1
1551 Comment1
1551 Comment2
1551 Comment3
1551 Comment4

There can be an infinite number of comments per ID (however, there usually aren't more than 3 or 4) I want to get all 3 comments for ID 1356 (for example) into multiple variables like this:

ID COMM_Multiple Comm_Count Comm1 Comm 2
1356 Y 3 " " " "

I can't get it to even look right on here. But I want the Y to match with Comm_multiple and the 3 to be under Comm_count

Etc etc.... does this make any sense? I don't want them concatenated into one variable, but I want each comment to
remain its own variable. I just want ID to be a single, unique identifier so that there is only 1 per person.

I don't know what to do!
Help Smiley Sad

Thanks Smiley Happy Message was edited by: Nov_girl
N/A
Posts: 0

Re: SAS NOVICE NEED TO DEDUPLICATE!

Firstly, if you are going to be a SAS programmer you need to know that 110% indicates that there is probably some sort of error with your data.

Aside from that, you need to do this.....

Although your data looks sorted already, sort it.

proc sort data=study;
by id;
run;

Then flatten the data down to one line per id........

data flat;
*give the concatenated comment anough characters to take all comments;
*and give the multiple comments flag a length of one;
length comments $500 comm_multiple $1;
*retain the value of the comments variable from record to record;
retain comments;
set study;
by id;
*if its the first occurence of id then reset the count and the concatenated field;
if first.id then do;
comments='';
comm_count=0;
end;
*add one to the count for each record;
comm_count+1;
*and concatenate the comment onto the overall comments;
comments=left(trim(comments))!!' '!!comment;
*when we process the last record for the id;
if last.id then do;
*of we have more than one comment set the flag to Y;
if comm_count>1 then comm_multiple='Y';
else comm_multiple='N';
*output the record, one per id;
output;
end;
run;
N/A
Posts: 0

Re: SAS NOVICE NEED TO DEDUPLICATE!

Thanks for the response. Did I mention that I'm a novice Smiley Wink

So...I haven't actually created the variables for multiple comments (yes/no) or multiple comments/count.. Can that be done within this step?
Super Contributor
Super Contributor
Posts: 3,174

Re: SAS NOVICE NEED TO DEDUPLICATE!

Also consider the merit of using SAS PROC TRANSPOSE to "go horizontal" with your file letting SAS generate your COMMENTn variables, to be followed by a simple SAS DATA step used to consolidate and count, as shown below:

%LET MAXLEN = 1000;
DATA consolidated_file;
KEEP COMMENT: ALL_COMMENTS COUNT_COMMENT;
* Generate some SAS variables to mimic TRANSPOSE output. ;
COMMENT1 = 'This is comment #1';
COMMENT2 = 'This is comment #2';
COMMENT3 = ' ';
COMMENT4 = ' ';
* SET transposed_file;
ARRAY ACOMMENTS (*) $ COMMENT: ;
* Combine individual comment variables into one. ;
LENGTH ALL_COMMENTS $ &maxlen;
DO I=1 TO DIM(ACOMMENTS);
IF ACOMMENTS(I) NE ' ' THEN DO;
* GIVE UP IF WE WILL TRUNCATE THE COMBINED COMMENT VAR. ;
LEN = LENGTH(ALL_COMMENTS!!ACOMMENTS(I));
IF LENGTH( catt( ALL_COMMENTS,ACOMMENTS(I) ) ) GT &MAXLEN THEN ABORT ;
ALL_COMMENTS = CATX(' ',ALL_COMMENTS,ACOMMENTS(I) );
END;
ELSE LEAVE;
END;
COUNT_COMMENT = I-1;
RUN;


Scott Barry
SBBWorks, Inc.
Ask a Question
Discussion stats
  • 3 replies
  • 502 views
  • 0 likes
  • 2 in conversation