BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
julss
Calcite | Level 5

Hello everyone,

I have the following table

StudentNumberIDNameLast Name
12345CSE890122CarlSmith
78946SJE901205SusanJohnson
63796CSE890122CarlSmith
36978EME750615EddyMurphy
42135SJE901205SusanJohnson

And I have the same student under different student numbers, for example, Carl Smith under #12345 and #63796 and Susan Johnson under #78946 and #42135. What I'd like to have is

StudentNumber1StudentNumber2ID1ID2Name1Name2LastName1 Last Name2
1234563796CSE890122CSE890122CarlCarlSmithSmith
7894642135SJE901205SJE901205SusanSusanJohnsonJohnson

Is it possible to transform it that way so that if ID is duplicated, another column is created with the different student numbers and all the related data? if that makes any sense... Any idea will be greatly appreciated.

Thanks!!

1 ACCEPTED SOLUTION

Accepted Solutions
naveen_srini
Quartz | Level 8

data have;

input (studentnumber id name lastname) ($) ;

datalines;

12345    CSE890122    Carl    Smith

78946    SJE901205    Susan    Johnson

63796    CSE890122    Carl    Smith

36978    EME750615    Eddy    Murphy

42135    SJE901205    Susan    Johnson

;

proc sort data=have out=have1;

by id;

run;

data want;

set have1;

by id;

if first.id then count=0;

count+1;

prev_stnumber=lag(studentnumber);

if count>1 then do;

studentnumber1=prev_stnumber;

studentnumber2=studentnumber;

id1=id;

id2=id;

name1=name;

name2=name;

lastname1=lastname;

lastname2=lastname;

output;

end;

keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;

run;

Regards,

Naveen Srinivasan

L&T Infotech

View solution in original post

11 REPLIES 11
naveen_srini
Quartz | Level 8

data have;

input (studentnumber id name lastname) ($) ;

datalines;

12345    CSE890122    Carl    Smith

78946    SJE901205    Susan    Johnson

63796    CSE890122    Carl    Smith

36978    EME750615    Eddy    Murphy

42135    SJE901205    Susan    Johnson

;

proc sort data=have out=have1;

by id;

run;

data want;

set have1;

by id;

if first.id then count=0;

count+1;

prev_stnumber=lag(studentnumber);

if count>1 then do;

studentnumber1=prev_stnumber;

studentnumber2=studentnumber;

id1=id;

id2=id;

name1=name;

name2=name;

lastname1=lastname;

lastname2=lastname;

output;

end;

keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;

run;

Regards,

Naveen Srinivasan

L&T Infotech

julss
Calcite | Level 5

Thank you, that was just what I needed!

julss
Calcite | Level 5


Another question, using this example, I have Carl Smith now 3 times, with a different student number

StudentNumberIDNameLastName
12345CSE890122CarlSmith
78946SJE901205SusanJohnson
63796CSE890122CarlSmith
36978EME750615EddyMurphy
42135SJE901205SusanJohnson
98523CSE890122CarlSmith

How can I tell the program to take only the first 2 observations? So that in the result I can get something like this:

StudentNumber1StudentNumber2ID1ID2Name1Name2LastName1LastName2
1234563796CSE890122CSE890122CarlCarlSmithSmith
7894642135SJE901205SJE901205SusanSusanJohnsonJohnson

and NOT this:

StudentNumber1StudentNumber2ID1ID2Name1Name2LastName1Lastname2
1234563796CSE890122CSE890122CarlCarlSmithSmith
6379698523CSE890122CSE890122CarlCarlSmithSmith
7894642135SJE901205SJE901205SusanSusanJohnsonJohnson


Thank you!!

naveen_srini
Quartz | Level 8

Hi,

/*modified code*/

data have;

input (studentnumber id name lastname) (:$10.) ;

datalines;

12345    CSE890122    Carl    Smith

78946    SJE901205    Susan    Johnson

63796    CSE890122    Carl    Smith

36978    EME750615    Eddy    Murphy

42135    SJE901205    Susan    Johnson

98523    CSE890122    Carl    Smith

;

proc sort data=have out=have1;

by id;

run;

data want;

set have1;

by id;

if first.id then count=0;

count+1;

prev_stnumber=lag(studentnumber);

if count>1 and count<3 then do;

studentnumber1=prev_stnumber;

studentnumber2=studentnumber;

id1=id;

id2=id;

name1=name;

name2=name;

lastname1=lastname;

lastname2=lastname;

output;

end;

keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;

run;

julss
Calcite | Level 5

Thank you very much... and one last question... Now I have another student with same ID (Carl Jackson) but name or last name do not match...

StudentNumberIDNameLastName
12345CSE890122CarlSmith
78946SJE901205SusanJohnson
63796CSE890122CarlSmith
36978EME750615EddyMurphy
42135SJE901205SusanJohnson
98523CSE890122CarlSmith
56795CSE890122CarlJackson

Since I'm trying to identify duplicated values... How can I still get this desired output?...

StudentNumber1StudentNumber2ID1

ID2

Name1Name2LastName1LastName2
1234563796CSE890122CSE890122CarlCarlSmithSmith
7894642135SJE901205SJE901205SusanSusanJohnsonJohnson

Where ID, Name and LastName from the first student has to be equal to ID, Name and LastName from the second student

naveen_srini
Quartz | Level 8

The same code will produce your desired results for your Carl Jackson inclusion without having to worry about the change is name or last name because the count condition is done based on by group processing of ID. So chill:

data have;

input (studentnumber id name lastname) (:$10.) ;

datalines;

12345    CSE890122    Carl    Smith

78946    SJE901205    Susan    Johnson

63796    CSE890122    Carl    Smith

36978    EME750615    Eddy    Murphy

42135    SJE901205    Susan    Johnson

98523    CSE890122    Carl    Smith

56795    CSE890122    Carl    Jackson

;

proc sort data=have out=have1;

by id;

run;

data want;

set have1;

by id;

if first.id then count=0;

count+1;

prev_stnumber=lag(studentnumber);

if count>1 and count<3 then do;

studentnumber1=prev_stnumber;

studentnumber2=studentnumber;

id1=id;

id2=id;

name1=name;

name2=name;

lastname1=lastname;

lastname2=lastname;

output;

end;

keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;

run;

julss
Calcite | Level 5

Thank you again!

Jagadishkatam
Amethyst | Level 16

Alternatively please try the ARRAY way

data have;

input studentNumber    ID$    Name$    LastName$;

datalines;

12345    CSE890122    Carl    Smith

78946    SJE901205    Susan    Johnson

63796    CSE890122    Carl    Smith

36978    EME750615    Eddy    Murphy

42135    SJE901205    Susan    Johnson

98523    CSE890122    Carl    Smith

;

proc sort data=have;

by id;

run;

data want;

set have;

by id;

retain count studentNumber1-studentNumber2 id1-id2  name1-name2  LastName1-LastName2;

if first.id then count=1;

else count+1;

if 1<=count<=2 then do;

array vs(*)  studentNumber1-studentNumber2;

array vs1(*) $ name1-name2;

array vs2(*) $ LastName1-LastName2;

array vs3(*) $ id1-id2;

if first.id then do;

do i = 1 to dim(vs);

vs(i)=.;

vs1(i)='';

vs2(i)='';

vs3(i)='';

end;

end;

vs(count)= studentNumber;

vs1(count)= name;

vs2(count)= LastName;

vs3(count)= id;

if last.id and count >1;

end;

drop count i studentNumber    ID    Name    LastName;

run;

Thanks,

Jag

Thanks,
Jag
Ksharp
Super User

The simplest way is using proc means + idgroups.

data have;
input studentnumber id : $20. name : $20. lastname : $20.;
cards;
12345     CSE890122     Carl     Smith
78946     SJE901205     Susan     Johnson
63796     CSE890122     Carl     Smith
36978     EME750615     Eddy     Murphy
42135     SJE901205     Susan     Johnson
;
run;
proc sort data=have;by name lastname;run;
proc sql noprint;
 select max(n) into : n 
  from (select count(*) as n from have group by name,lastname);
quit;
proc summary data=have ;
 by  name lastname;
 output out=want(drop=_:) idgroup(out[&n] (studentnumber id  name  lastname)=) ;
run;

Xia Keshan

julss
Calcite | Level 5

Thank you Master

ballardw
Super User

When you say "all related data" are you referring to more that you displayed? If so are there going to be more values that aren't duplicated? Since you say "student" I might guess that you have information such as Class and or grade. Is that all supposed to end up on one line?

Does the output have to be a data set or are you looking for a report?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1365 views
  • 6 likes
  • 5 in conversation