Help using Base SAS procedures

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

Reply
Occasional Contributor
Posts: 8

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

Hi Everyone,

I am new to SAS. so, Please let me know that the query " IF NOT FIRST.CPNP AND LAST.CPNP AND PLANT='USM' THEN DELETE;"

will delete duplicate rows from data sets or what??

Please reply me fast...

Thanks in Advance...

Henary

Super Contributor
Posts: 356

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

This will delete the last CPNP record if it is not the first record of the CPNP group and where plant=USM only.

It will not delete all duplicates if your CPNP group has more than two records.

Barry

Occasional Contributor
Posts: 8

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

Posted in reply to twocanbazza

Hi Barry,

This is the full data set code..

"

DATA BQDATA.VBELTQUOTES(DROP=ERROR) BQDATA.VBELTERRORS;

   LENGTH ERROR $ 25;

SET BQDATA.VBELTQUOTES; BY PRODNO CPNP PLANT PARTNO COST_YEAR;

  DROP T COST_YEAR CREATE_DATE UPDATE_DATE CLASS_EFFECTIVE_DATE QUOTE_EFFECTIVE_DATE;

  IF NOT FIRST.CPNP AND LAST.CPNP AND PLANT='USM' THEN DELETE;

......

......

....

RUN;

"

I thought this might help you to understand...

All I understood is that "BY PRODNO CPNP PLANT PARTNO COST_YEAR;" these 5 columns in By clause so for group of these 5 columns will have two set of values with same CPNP then it will delete second records..

In most cases I have CPNP value NULL..

If possible then can you reply me with example with random values that which records it will delete and which records will not..

Please...

Thanks Again in advance...

PROC Star
Posts: 7,467

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

It still isn't clear which records you want to delete.  You can also, rather than use delete, only keep the records that you want.  Since you include all 5 variables in your by statement, are they all relevant?

If they are, and you don't want to keep any records that have plant='USM', and you only want the first record from sets of records that match on PRODNO CPNP PLANT PARTNO COST_YEAR, then you could use something like:

if PLANT ne 'USM' and first.cost_year;



Occasional Contributor
Posts: 8

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

Hi,

Thanks for quick reply...

Actually I am converting this SAS code to SQL...

I have no idea about SAS and all I want to know is that what the hack is this line doing in the code...

I don't understand what and how it is deleting...

Thanks in advance..

Super User
Posts: 5,497

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

In a way, the question becomes how much responsibility do you want to undertake.

Do you want to convert the code to SQL, even if the code is wrong?

Or do you want to understand what records are supposed to be deleted, and make the SQL version produce the correct result?

Either way, you can get answers here.  But you might have to make your choice first.

It sure looks like the DELETE statement that you are working with is incorrect.  It would operate as if there were parentheses in place:

IF (NOT FIRST.CPNP) and (LAST.CPNP) and (PLANT='USM') THEN DELETE;

We can tell you what the code does, but we can only guess at what the code is supposed to do.

Occasional Contributor
Posts: 8

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

Posted in reply to Astounding

Yes, I Need to understand what this code will do?

and I want to make same in SQL...

If I can know that which are the records will deleted using this code then i can make in SQL

Super User
Posts: 5,497

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

OK, here's what it says to do.  The key statements:

BY PRODNO CPNP PLANT PART_NO COST_YEAR;

IF NOT FIRST.CPNP and LAST.CPNP and PLANT='USM' THEN DELETE;

Assuming that these are the statements in the original code, it means:

If a block of records (must be more than one record in the block):

share the same PRODNO and CPNP, and

the last record in the block is USM,

then delete that last record of the block, keeping all the earlier records in the block.

You can see why many of us doubt that this is the right thing to do, but that's what the code is saying.

Good luck.

Super User
Posts: 5,497

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

Posted in reply to Astounding

Should also note the definition of the "last record in the block".

Given the PRODNO and CPNP, the last record in the block is the one with the

largest value of PLANT,

and within the largest value of PLANT, the largest value of PART_NO,

and within the largest value of PART_NO, the largest value of COST_YEAR.

I still think the original code is wrong, but at least you know what it means.

Good luck.

PROC Star
Posts: 7,467

Re: FIRST.VARIABLE & LAST.VARIABLE PROBLEM

While you may be able to write something in sql that mimics the same behavior, possibly using the undocumented monotonic function, there is no guarantee that it will be accomplishing the same thing as the datastep code.

All flavors of SQL come with the caveat that the concept of record, let alone record order, is irrelevant.  The use of first. and last., on the other hand, are totally dependent upon record order.

N/A
Posts: 1

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

Yes, Barry is absolutly right.

It will not delete all duplicates.

This will delete only the last record of each CPNP group if it is not first and also where plant=USM.

If you wants to delete all duplicates and out of all duplicates you want to keep only the first record where plant=USM then you can go for the code given below:-

WHERE PLANT='USM';

IF NOT FIRST.CPNP THEN DELETE;

Sadhna Smiley Happy

Frequent Contributor
Posts: 139

FIRST.VARIABLE & LAST.VARIABLE PROBLEM

What is the output you need. with the current record it will delete only the record in each cpnp group if the last in group is having plant='USM' and there are more than one record for that group.

Please let us know your input and output required to help you more'

Ask a Question
Discussion stats
  • 11 replies
  • 4377 views
  • 0 likes
  • 6 in conversation