Help using Base SAS procedures

New user here with a FIRST.var & LAST.var question

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

New user here with a FIRST.var & LAST.var question

I've got SAS code that extracts data from a Peoplesoft environment (Oracle).  It was written years ago and I have to decipher it without any documentation, and move it into a Peoplecode Application Engine using SQL steps.

One piece of code clearly shows the developer finding duplicates using the following key.

STDNT_KEY = EMPLID || STRM || CLASS_NBR;

DATA DUPS_CDUP_ENRL;

SET CLS_CDUP_ENRL;

BY STDNT_KEY;

IF (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) OR

   (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0) OR

   (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1);

   KEEP OWNER EMPLID STRM SESSION_CODE SUBJECT CATALOG_NBR CLASS_SECTION DUP_CODE;

RUN;

PROC EXPORT DATA=DUPS_CDUP_ENRL

            OUTFILE= "C:\Processes\C_DUP\for_1058\DUPLICATE_CDUP_ENTRIES.xls"

            DBMS=EXCEL2000 REPLACE;

RUN;

In my understanding if FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 1  then he would not have any duplicates, so he appears to only look for any row that is a duplicate.  True?

Again...I have no business rules or documentation so I have to try and understand the intent of the developer via the code.

Later I see the following code...

IF (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0) OR

   (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) THEN DELETE;

OUTPUT CDUP_TRANS;

RUN;

PROC EXPORT DATA=CDUP_TRANS

            OUTFILE= "C:\Processes\C_DUP\for_1058\CDUP_TRANS.xls"

            DBMS=EXCEL2000 REPLACE;

RUN;

I don't know what CDUP_TRANS output is supposed to represent and don't know  if the developer was only looking for dups STDNT_KEY = 1 for both first and last.  If so why did they exclude (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) from the CDUP_TRANS code? 

Is

(FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) OR

(FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1)

redundant?


Accepted Solutions
Solution
‎03-20-2015 05:40 PM
Super Contributor
Posts: 490

Re: New user here with a FIRST.var & LAST.var question

Posted in reply to SFDonovan

In my understanding if FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 1  then he would not have any duplicates, so he appears to only look for any row that is a duplicate.  True?

That is true he is keeping STDNT's data who have more than one entries.


I don't know what CDUP_TRANS output is supposed to represent and don't know  if the developer was only looking for dups STDNT_KEY = 1 for both first and last.  If so why did they exclude (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) from the CDUP_TRANS code?

In this code he summarize each from those who have more than one entries, representing them by the first entry only. For summarize or reporting...

But your code's first part is not complete. So it could be that he is report all the STDNT in the database table without duplicate. So the STDNT with more than one entries will be represented by the first record and those who has one entries will also be represented.

Is

(FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) OR

(FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1)

redundant?

No,

(FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... mean the first entry in the group

(FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... mean the last entry in the group

*****

For example:

STDNT_KEY

1

2

3

1

1

3

After the data sorted by STDNT_KEY:

STDNT_KEY

1

1

1

2

3

3

By understanding the first and last temporary SAS variables in the sorted data, and understanding that each unique student entries is a group in it's self:

STDNT_KEY

1       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 1's group

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0) ... not the first or the last

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 1's group

2       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 1) ... the first entry in the 2's group and the last

3       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 3's group

3       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 3's group

*********

So

IF (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) OR

   (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0) OR

   (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) THEN OUTPUT;

gives you

STDNT_KEY

1       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 1's group

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0)

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 1's group

2       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 1) ... the first entry in the 2's group and the last

3       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 3's group

3       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 3's group


*******

And

IF (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0) OR

   (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) THEN DELETE;;

gives you:

STDNT_KEY

1       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 1's group

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0)

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 1's group

2       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 1) ... the first entry in the 2's group and the last

3       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 3's group

3       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 3's group

View solution in original post


All Replies
Solution
‎03-20-2015 05:40 PM
Super Contributor
Posts: 490

Re: New user here with a FIRST.var & LAST.var question

Posted in reply to SFDonovan

In my understanding if FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 1  then he would not have any duplicates, so he appears to only look for any row that is a duplicate.  True?

That is true he is keeping STDNT's data who have more than one entries.


I don't know what CDUP_TRANS output is supposed to represent and don't know  if the developer was only looking for dups STDNT_KEY = 1 for both first and last.  If so why did they exclude (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) from the CDUP_TRANS code?

In this code he summarize each from those who have more than one entries, representing them by the first entry only. For summarize or reporting...

But your code's first part is not complete. So it could be that he is report all the STDNT in the database table without duplicate. So the STDNT with more than one entries will be represented by the first record and those who has one entries will also be represented.

Is

(FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) OR

(FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1)

redundant?

No,

(FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... mean the first entry in the group

(FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... mean the last entry in the group

*****

For example:

STDNT_KEY

1

2

3

1

1

3

After the data sorted by STDNT_KEY:

STDNT_KEY

1

1

1

2

3

3

By understanding the first and last temporary SAS variables in the sorted data, and understanding that each unique student entries is a group in it's self:

STDNT_KEY

1       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 1's group

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0) ... not the first or the last

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 1's group

2       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 1) ... the first entry in the 2's group and the last

3       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 3's group

3       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 3's group

*********

So

IF (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) OR

   (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0) OR

   (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) THEN OUTPUT;

gives you

STDNT_KEY

1       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 1's group

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0)

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 1's group

2       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 1) ... the first entry in the 2's group and the last

3       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 3's group

3       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 3's group


*******

And

IF (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0) OR

   (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) THEN DELETE;;

gives you:

STDNT_KEY

1       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 1's group

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 0)

1       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 1's group

2       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 1) ... the first entry in the 2's group and the last

3       (FIRST.STDNT_KEY = 1 & LAST.STDNT_KEY = 0) ... the first entry in the 3's group

3       (FIRST.STDNT_KEY = 0 & LAST.STDNT_KEY = 1) ... the last entry in the 3's group

Super Contributor
Posts: 358

Re: New user here with a FIRST.var & LAST.var question

Posted in reply to SFDonovan

Hi.

How about you try this:

IF FIRST.STDNT_KEY = 1 and LAST.STDNT_KEY = 1 then delete;

     else output;

this way - records that are unique (first and last = 1) are deleted from the data table and only the duplicate records are kept.

Super User
Posts: 11,343

Re: New user here with a FIRST.var & LAST.var question

Being a bit pedantic I would say records where the key is duplicated, not duplicate records.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 259 views
  • 1 like
  • 4 in conversation