I am a 110% novice in SAS and need to merge two big datasets. My problem is that I have multiple observations in multiple tables that use the same variable. It is unfortunately the only variable that all the tables share. When I merge them initially I get of course, duplicate observations, many with much of the same information but 1 or 2 variables have different information.
Would I create a dataset using a new variable to deal with this? In the end, I'd like to just have 1 'ID' per person, not multiple records per person.
To be more precise, going off of another's post, my data looks somewhat like this:
There can be an infinite number of comments per ID (however, there usually aren't more than 3 or 4) I want to get all 3 comments for ID 1356 (for example) into multiple variables like this:
ID COMM_Multiple Comm_Count Comm1 Comm 2
1356 Y 3 " " " "
I can't get it to even look right on here. But I want the Y to match with Comm_multiple and the 3 to be under Comm_count
Etc etc.... does this make any sense? I don't want them concatenated into one variable, but I want each comment to
remain its own variable. I just want ID to be a single, unique identifier so that there is only 1 per person.
Firstly, if you are going to be a SAS programmer you need to know that 110% indicates that there is probably some sort of error with your data.
Aside from that, you need to do this.....
Although your data looks sorted already, sort it.
proc sort data=study;
Then flatten the data down to one line per id........
*give the concatenated comment anough characters to take all comments;
*and give the multiple comments flag a length of one;
length comments $500 comm_multiple $1;
*retain the value of the comments variable from record to record;
*if its the first occurence of id then reset the count and the concatenated field;
if first.id then do;
*add one to the count for each record;
*and concatenate the comment onto the overall comments;
*when we process the last record for the id;
if last.id then do;
*of we have more than one comment set the flag to Y;
if comm_count>1 then comm_multiple='Y';
*output the record, one per id;
Also consider the merit of using SAS PROC TRANSPOSE to "go horizontal" with your file letting SAS generate your COMMENTn variables, to be followed by a simple SAS DATA step used to consolidate and count, as shown below:
%LET MAXLEN = 1000;
KEEP COMMENT: ALL_COMMENTS COUNT_COMMENT;
* Generate some SAS variables to mimic TRANSPOSE output. ;
COMMENT1 = 'This is comment #1';
COMMENT2 = 'This is comment #2';
COMMENT3 = ' ';
COMMENT4 = ' ';
* SET transposed_file;
ARRAY ACOMMENTS (*) $ COMMENT: ;
* Combine individual comment variables into one. ;
LENGTH ALL_COMMENTS $ &maxlen;
DO I=1 TO DIM(ACOMMENTS);
IF ACOMMENTS(I) NE ' ' THEN DO;
* GIVE UP IF WE WILL TRUNCATE THE COMBINED COMMENT VAR. ;
LEN = LENGTH(ALL_COMMENTS!!ACOMMENTS(I));
IF LENGTH( catt( ALL_COMMENTS,ACOMMENTS(I) ) ) GT &MAXLEN THEN ABORT ;
ALL_COMMENTS = CATX(' ',ALL_COMMENTS,ACOMMENTS(I) );
COUNT_COMMENT = I-1;