Hello everyone,
I have the following table
StudentNumber | ID | Name | Last Name |
---|---|---|---|
12345 | CSE890122 | Carl | Smith |
78946 | SJE901205 | Susan | Johnson |
63796 | CSE890122 | Carl | Smith |
36978 | EME750615 | Eddy | Murphy |
42135 | SJE901205 | Susan | Johnson |
And I have the same student under different student numbers, for example, Carl Smith under #12345 and #63796 and Susan Johnson under #78946 and #42135. What I'd like to have is
StudentNumber1 | StudentNumber2 | ID1 | ID2 | Name1 | Name2 | LastName1 | Last Name2 |
---|---|---|---|---|---|---|---|
12345 | 63796 | CSE890122 | CSE890122 | Carl | Carl | Smith | Smith |
78946 | 42135 | SJE901205 | SJE901205 | Susan | Susan | Johnson | Johnson |
Is it possible to transform it that way so that if ID is duplicated, another column is created with the different student numbers and all the related data? if that makes any sense... Any idea will be greatly appreciated.
Thanks!!
data have;
input (studentnumber id name lastname) ($) ;
datalines;
12345 CSE890122 Carl Smith
78946 SJE901205 Susan Johnson
63796 CSE890122 Carl Smith
36978 EME750615 Eddy Murphy
42135 SJE901205 Susan Johnson
;
proc sort data=have out=have1;
by id;
run;
data want;
set have1;
by id;
if first.id then count=0;
count+1;
prev_stnumber=lag(studentnumber);
if count>1 then do;
studentnumber1=prev_stnumber;
studentnumber2=studentnumber;
id1=id;
id2=id;
name1=name;
name2=name;
lastname1=lastname;
lastname2=lastname;
output;
end;
keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;
run;
Regards,
Naveen Srinivasan
L&T Infotech
data have;
input (studentnumber id name lastname) ($) ;
datalines;
12345 CSE890122 Carl Smith
78946 SJE901205 Susan Johnson
63796 CSE890122 Carl Smith
36978 EME750615 Eddy Murphy
42135 SJE901205 Susan Johnson
;
proc sort data=have out=have1;
by id;
run;
data want;
set have1;
by id;
if first.id then count=0;
count+1;
prev_stnumber=lag(studentnumber);
if count>1 then do;
studentnumber1=prev_stnumber;
studentnumber2=studentnumber;
id1=id;
id2=id;
name1=name;
name2=name;
lastname1=lastname;
lastname2=lastname;
output;
end;
keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;
run;
Regards,
Naveen Srinivasan
L&T Infotech
Thank you, that was just what I needed!
Another question, using this example, I have Carl Smith now 3 times, with a different student number
StudentNumber | ID | Name | LastName |
12345 | CSE890122 | Carl | Smith |
78946 | SJE901205 | Susan | Johnson |
63796 | CSE890122 | Carl | Smith |
36978 | EME750615 | Eddy | Murphy |
42135 | SJE901205 | Susan | Johnson |
98523 | CSE890122 | Carl | Smith |
How can I tell the program to take only the first 2 observations? So that in the result I can get something like this:
StudentNumber1 | StudentNumber2 | ID1 | ID2 | Name1 | Name2 | LastName1 | LastName2 |
12345 | 63796 | CSE890122 | CSE890122 | Carl | Carl | Smith | Smith |
78946 | 42135 | SJE901205 | SJE901205 | Susan | Susan | Johnson | Johnson |
and NOT this:
StudentNumber1 | StudentNumber2 | ID1 | ID2 | Name1 | Name2 | LastName1 | Lastname2 |
12345 | 63796 | CSE890122 | CSE890122 | Carl | Carl | Smith | Smith |
63796 | 98523 | CSE890122 | CSE890122 | Carl | Carl | Smith | Smith |
78946 | 42135 | SJE901205 | SJE901205 | Susan | Susan | Johnson | Johnson |
Thank you!!
/*modified code*/
data have;
input (studentnumber id name lastname) (:$10.) ;
datalines;
12345 CSE890122 Carl Smith
78946 SJE901205 Susan Johnson
63796 CSE890122 Carl Smith
36978 EME750615 Eddy Murphy
42135 SJE901205 Susan Johnson
98523 CSE890122 Carl Smith
;
proc sort data=have out=have1;
by id;
run;
data want;
set have1;
by id;
if first.id then count=0;
count+1;
prev_stnumber=lag(studentnumber);
if count>1 and count<3 then do;
studentnumber1=prev_stnumber;
studentnumber2=studentnumber;
id1=id;
id2=id;
name1=name;
name2=name;
lastname1=lastname;
lastname2=lastname;
output;
end;
keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;
run;
Thank you very much... and one last question... Now I have another student with same ID (Carl Jackson) but name or last name do not match...
StudentNumber | ID | Name | LastName |
12345 | CSE890122 | Carl | Smith |
78946 | SJE901205 | Susan | Johnson |
63796 | CSE890122 | Carl | Smith |
36978 | EME750615 | Eddy | Murphy |
42135 | SJE901205 | Susan | Johnson |
98523 | CSE890122 | Carl | Smith |
56795 | CSE890122 | Carl | Jackson |
Since I'm trying to identify duplicated values... How can I still get this desired output?...
StudentNumber1 | StudentNumber2 | ID1 | ID2 | Name1 | Name2 | LastName1 | LastName2 |
12345 | 63796 | CSE890122 | CSE890122 | Carl | Carl | Smith | Smith |
78946 | 42135 | SJE901205 | SJE901205 | Susan | Susan | Johnson | Johnson |
Where ID, Name and LastName from the first student has to be equal to ID, Name and LastName from the second student
The same code will produce your desired results for your Carl Jackson inclusion without having to worry about the change is name or last name because the count condition is done based on by group processing of ID. So chill:
data have;
input (studentnumber id name lastname) (:$10.) ;
datalines;
12345 CSE890122 Carl Smith
78946 SJE901205 Susan Johnson
63796 CSE890122 Carl Smith
36978 EME750615 Eddy Murphy
42135 SJE901205 Susan Johnson
98523 CSE890122 Carl Smith
56795 CSE890122 Carl Jackson
;
proc sort data=have out=have1;
by id;
run;
data want;
set have1;
by id;
if first.id then count=0;
count+1;
prev_stnumber=lag(studentnumber);
if count>1 and count<3 then do;
studentnumber1=prev_stnumber;
studentnumber2=studentnumber;
id1=id;
id2=id;
name1=name;
name2=name;
lastname1=lastname;
lastname2=lastname;
output;
end;
keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;
run;
Thank you again!
Alternatively please try the ARRAY way
data have;
input studentNumber ID$ Name$ LastName$;
datalines;
12345 CSE890122 Carl Smith
78946 SJE901205 Susan Johnson
63796 CSE890122 Carl Smith
36978 EME750615 Eddy Murphy
42135 SJE901205 Susan Johnson
98523 CSE890122 Carl Smith
;
proc sort data=have;
by id;
run;
data want;
set have;
by id;
retain count studentNumber1-studentNumber2 id1-id2 name1-name2 LastName1-LastName2;
if first.id then count=1;
else count+1;
if 1<=count<=2 then do;
array vs(*) studentNumber1-studentNumber2;
array vs1(*) $ name1-name2;
array vs2(*) $ LastName1-LastName2;
array vs3(*) $ id1-id2;
if first.id then do;
do i = 1 to dim(vs);
vs(i)=.;
vs1(i)='';
vs2(i)='';
vs3(i)='';
end;
end;
vs(count)= studentNumber;
vs1(count)= name;
vs2(count)= LastName;
vs3(count)= id;
if last.id and count >1;
end;
drop count i studentNumber ID Name LastName;
run;
Thanks,
Jag
The simplest way is using proc means + idgroups.
data have; input studentnumber id : $20. name : $20. lastname : $20.; cards; 12345 CSE890122 Carl Smith 78946 SJE901205 Susan Johnson 63796 CSE890122 Carl Smith 36978 EME750615 Eddy Murphy 42135 SJE901205 Susan Johnson ; run; proc sort data=have;by name lastname;run; proc sql noprint; select max(n) into : n from (select count(*) as n from have group by name,lastname); quit; proc summary data=have ; by name lastname; output out=want(drop=_:) idgroup(out[&n] (studentnumber id name lastname)=) ; run;
Xia Keshan
Thank you xia keshan, it was very helpful!
When you say "all related data" are you referring to more that you displayed? If so are there going to be more values that aren't duplicated? Since you say "student" I might guess that you have information such as Class and or grade. Is that all supposed to end up on one line?
Does the output have to be a data set or are you looking for a report?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.