Comparing columns

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Comparing columns

Hello everyone,

I have the following table

StudentNumberIDNameLast Name
12345CSE890122CarlSmith
78946SJE901205SusanJohnson
63796CSE890122CarlSmith
36978EME750615EddyMurphy
42135SJE901205SusanJohnson

And I have the same student under different student numbers, for example, Carl Smith under #12345 and #63796 and Susan Johnson under #78946 and #42135. What I'd like to have is

StudentNumber1StudentNumber2ID1ID2Name1Name2LastName1 Last Name2
1234563796CSE890122CSE890122CarlCarlSmithSmith
7894642135SJE901205SJE901205SusanSusanJohnsonJohnson

Is it possible to transform it that way so that if ID is duplicated, another column is created with the different student numbers and all the related data? if that makes any sense... Any idea will be greatly appreciated.

Thanks!!


Accepted Solutions
Solution
‎04-10-2015 12:03 PM
Frequent Contributor
Posts: 115

Re: Comparing columns

data have;

input (studentnumber id name lastname) ($) ;

datalines;

12345    CSE890122    Carl    Smith

78946    SJE901205    Susan    Johnson

63796    CSE890122    Carl    Smith

36978    EME750615    Eddy    Murphy

42135    SJE901205    Susan    Johnson

;

proc sort data=have out=have1;

by id;

run;

data want;

set have1;

by id;

if first.id then count=0;

count+1;

prev_stnumber=lag(studentnumber);

if count>1 then do;

studentnumber1=prev_stnumber;

studentnumber2=studentnumber;

id1=id;

id2=id;

name1=name;

name2=name;

lastname1=lastname;

lastname2=lastname;

output;

end;

keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;

run;

Regards,

Naveen Srinivasan

L&T Infotech

View solution in original post


All Replies
Solution
‎04-10-2015 12:03 PM
Frequent Contributor
Posts: 115

Re: Comparing columns

data have;

input (studentnumber id name lastname) ($) ;

datalines;

12345    CSE890122    Carl    Smith

78946    SJE901205    Susan    Johnson

63796    CSE890122    Carl    Smith

36978    EME750615    Eddy    Murphy

42135    SJE901205    Susan    Johnson

;

proc sort data=have out=have1;

by id;

run;

data want;

set have1;

by id;

if first.id then count=0;

count+1;

prev_stnumber=lag(studentnumber);

if count>1 then do;

studentnumber1=prev_stnumber;

studentnumber2=studentnumber;

id1=id;

id2=id;

name1=name;

name2=name;

lastname1=lastname;

lastname2=lastname;

output;

end;

keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;

run;

Regards,

Naveen Srinivasan

L&T Infotech

Occasional Contributor
Posts: 6

Re: Comparing columns

Thank you, that was just what I needed!

Occasional Contributor
Posts: 6

Re: Comparing columns


Another question, using this example, I have Carl Smith now 3 times, with a different student number

StudentNumberIDNameLastName
12345CSE890122CarlSmith
78946SJE901205SusanJohnson
63796CSE890122CarlSmith
36978EME750615EddyMurphy
42135SJE901205SusanJohnson
98523CSE890122CarlSmith

How can I tell the program to take only the first 2 observations? So that in the result I can get something like this:

StudentNumber1StudentNumber2ID1ID2Name1Name2LastName1LastName2
1234563796CSE890122CSE890122CarlCarlSmithSmith
7894642135SJE901205SJE901205SusanSusanJohnsonJohnson

and NOT this:

StudentNumber1StudentNumber2ID1ID2Name1Name2LastName1Lastname2
1234563796CSE890122CSE890122CarlCarlSmithSmith
6379698523CSE890122CSE890122CarlCarlSmithSmith
7894642135SJE901205SJE901205SusanSusanJohnsonJohnson


Thank you!!

Frequent Contributor
Posts: 115

Re: Comparing columns

Hi,

/*modified code*/

data have;

input (studentnumber id name lastname) (:$10.) ;

datalines;

12345    CSE890122    Carl    Smith

78946    SJE901205    Susan    Johnson

63796    CSE890122    Carl    Smith

36978    EME750615    Eddy    Murphy

42135    SJE901205    Susan    Johnson

98523    CSE890122    Carl    Smith

;

proc sort data=have out=have1;

by id;

run;

data want;

set have1;

by id;

if first.id then count=0;

count+1;

prev_stnumber=lag(studentnumber);

if count>1 and count<3 then do;

studentnumber1=prev_stnumber;

studentnumber2=studentnumber;

id1=id;

id2=id;

name1=name;

name2=name;

lastname1=lastname;

lastname2=lastname;

output;

end;

keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;

run;

Occasional Contributor
Posts: 6

Re: Comparing columns

Thank you very much... and one last question... Now I have another student with same ID (Carl Jackson) but name or last name do not match...

StudentNumberIDNameLastName
12345CSE890122CarlSmith
78946SJE901205SusanJohnson
63796CSE890122CarlSmith
36978EME750615EddyMurphy
42135SJE901205SusanJohnson
98523CSE890122CarlSmith
56795CSE890122CarlJackson

Since I'm trying to identify duplicated values... How can I still get this desired output?...

StudentNumber1StudentNumber2ID1

ID2

Name1Name2LastName1LastName2
1234563796CSE890122CSE890122CarlCarlSmithSmith
7894642135SJE901205SJE901205SusanSusanJohnsonJohnson

Where ID, Name and LastName from the first student has to be equal to ID, Name and LastName from the second student

Frequent Contributor
Posts: 115

Re: Comparing columns

The same code will produce your desired results for your Carl Jackson inclusion without having to worry about the change is name or last name because the count condition is done based on by group processing of ID. So chill:

data have;

input (studentnumber id name lastname) (:$10.) ;

datalines;

12345    CSE890122    Carl    Smith

78946    SJE901205    Susan    Johnson

63796    CSE890122    Carl    Smith

36978    EME750615    Eddy    Murphy

42135    SJE901205    Susan    Johnson

98523    CSE890122    Carl    Smith

56795    CSE890122    Carl    Jackson

;

proc sort data=have out=have1;

by id;

run;

data want;

set have1;

by id;

if first.id then count=0;

count+1;

prev_stnumber=lag(studentnumber);

if count>1 and count<3 then do;

studentnumber1=prev_stnumber;

studentnumber2=studentnumber;

id1=id;

id2=id;

name1=name;

name2=name;

lastname1=lastname;

lastname2=lastname;

output;

end;

keep studentnumber1 studentnumber2 id1 id2 name1 name2 lastname1 lastname2;

run;

Occasional Contributor
Posts: 6

Re: Comparing columns

Thank you again!

Super User
Posts: 1,117

Re: Comparing columns

Alternatively please try the ARRAY way

data have;

input studentNumber    ID$    Name$    LastName$;

datalines;

12345    CSE890122    Carl    Smith

78946    SJE901205    Susan    Johnson

63796    CSE890122    Carl    Smith

36978    EME750615    Eddy    Murphy

42135    SJE901205    Susan    Johnson

98523    CSE890122    Carl    Smith

;

proc sort data=have;

by id;

run;

data want;

set have;

by id;

retain count studentNumber1-studentNumber2 id1-id2  name1-name2  LastName1-LastName2;

if first.id then count=1;

else count+1;

if 1<=count<=2 then do;

array vs(*)  studentNumber1-studentNumber2;

array vs1(*) $ name1-name2;

array vs2(*) $ LastName1-LastName2;

array vs3(*) $ id1-id2;

if first.id then do;

do i = 1 to dim(vs);

vs(i)=.;

vs1(i)='';

vs2(i)='';

vs3(i)='';

end;

end;

vs(count)= studentNumber;

vs1(count)= name;

vs2(count)= LastName;

vs3(count)= id;

if last.id and count >1;

end;

drop count i studentNumber    ID    Name    LastName;

run;

Thanks,

Jag

Thanks,
Jag
Grand Advisor
Posts: 9,584

Re: Comparing columns

The simplest way is using proc means + idgroups.

data have;
input studentnumber id : $20. name : $20. lastname : $20.;
cards;
12345     CSE890122     Carl     Smith
78946     SJE901205     Susan     Johnson
63796     CSE890122     Carl     Smith
36978     EME750615     Eddy     Murphy
42135     SJE901205     Susan     Johnson
;
run;
proc sort data=have;by name lastname;run;
proc sql noprint;
 select max(n) into : n 
  from (select count(*) as n from have group by name,lastname);
quit;
proc summary data=have ;
 by  name lastname;
 output out=want(drop=_:) idgroup(out[&n] (studentnumber id  name  lastname)=) ;
run;

Xia Keshan

Occasional Contributor
Posts: 6

Re: Comparing columns

Thank you Master

Grand Advisor
Posts: 10,223

Re: Comparing columns

When you say "all related data" are you referring to more that you displayed? If so are there going to be more values that aren't duplicated? Since you say "student" I might guess that you have information such as Class and or grade. Is that all supposed to end up on one line?

Does the output have to be a data set or are you looking for a report?

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 323 views
  • 6 likes
  • 5 in conversation