BookmarkSubscribeRSS Feed
Henary
Calcite | Level 5

Hi Everyone,

I am new to SAS. so, Please let me know that the query " IF NOT FIRST.CPNP AND LAST.CPNP AND PLANT='USM' THEN DELETE;"

will delete duplicate rows from data sets or what??

Please reply me fast...

Thanks in Advance...

Henary

11 REPLIES 11
twocanbazza
Quartz | Level 8

This will delete the last CPNP record if it is not the first record of the CPNP group and where plant=USM only.

It will not delete all duplicates if your CPNP group has more than two records.

Barry

Henary
Calcite | Level 5

Hi Barry,

This is the full data set code..

"

DATA BQDATA.VBELTQUOTES(DROP=ERROR) BQDATA.VBELTERRORS;

   LENGTH ERROR $ 25;

SET BQDATA.VBELTQUOTES; BY PRODNO CPNP PLANT PARTNO COST_YEAR;

  DROP T COST_YEAR CREATE_DATE UPDATE_DATE CLASS_EFFECTIVE_DATE QUOTE_EFFECTIVE_DATE;

  IF NOT FIRST.CPNP AND LAST.CPNP AND PLANT='USM' THEN DELETE;

......

......

....

RUN;

"

I thought this might help you to understand...

All I understood is that "BY PRODNO CPNP PLANT PARTNO COST_YEAR;" these 5 columns in By clause so for group of these 5 columns will have two set of values with same CPNP then it will delete second records..

In most cases I have CPNP value NULL..

If possible then can you reply me with example with random values that which records it will delete and which records will not..

Please...

Thanks Again in advance...

art297
Opal | Level 21

It still isn't clear which records you want to delete.  You can also, rather than use delete, only keep the records that you want.  Since you include all 5 variables in your by statement, are they all relevant?

If they are, and you don't want to keep any records that have plant='USM', and you only want the first record from sets of records that match on PRODNO CPNP PLANT PARTNO COST_YEAR, then you could use something like:

if PLANT ne 'USM' and first.cost_year;



Henary
Calcite | Level 5

Hi,

Thanks for quick reply...

Actually I am converting this SAS code to SQL...

I have no idea about SAS and all I want to know is that what the hack is this line doing in the code...

I don't understand what and how it is deleting...

Thanks in advance..

Astounding
PROC Star

In a way, the question becomes how much responsibility do you want to undertake.

Do you want to convert the code to SQL, even if the code is wrong?

Or do you want to understand what records are supposed to be deleted, and make the SQL version produce the correct result?

Either way, you can get answers here.  But you might have to make your choice first.

It sure looks like the DELETE statement that you are working with is incorrect.  It would operate as if there were parentheses in place:

IF (NOT FIRST.CPNP) and (LAST.CPNP) and (PLANT='USM') THEN DELETE;

We can tell you what the code does, but we can only guess at what the code is supposed to do.

Henary
Calcite | Level 5

Yes, I Need to understand what this code will do?

and I want to make same in SQL...

If I can know that which are the records will deleted using this code then i can make in SQL

Astounding
PROC Star

OK, here's what it says to do.  The key statements:

BY PRODNO CPNP PLANT PART_NO COST_YEAR;

IF NOT FIRST.CPNP and LAST.CPNP and PLANT='USM' THEN DELETE;

Assuming that these are the statements in the original code, it means:

If a block of records (must be more than one record in the block):

share the same PRODNO and CPNP, and

the last record in the block is USM,

then delete that last record of the block, keeping all the earlier records in the block.

You can see why many of us doubt that this is the right thing to do, but that's what the code is saying.

Good luck.

Astounding
PROC Star

Should also note the definition of the "last record in the block".

Given the PRODNO and CPNP, the last record in the block is the one with the

largest value of PLANT,

and within the largest value of PLANT, the largest value of PART_NO,

and within the largest value of PART_NO, the largest value of COST_YEAR.

I still think the original code is wrong, but at least you know what it means.

Good luck.

art297
Opal | Level 21

While you may be able to write something in sql that mimics the same behavior, possibly using the undocumented monotonic function, there is no guarantee that it will be accomplishing the same thing as the datastep code.

All flavors of SQL come with the caveat that the concept of record, let alone record order, is irrelevant.  The use of first. and last., on the other hand, are totally dependent upon record order.

anshu
Calcite | Level 5

Yes, Barry is absolutly right.

It will not delete all duplicates.

This will delete only the last record of each CPNP group if it is not first and also where plant=USM.

If you wants to delete all duplicates and out of all duplicates you want to keep only the first record where plant=USM then you can go for the code given below:-

WHERE PLANT='USM';

IF NOT FIRST.CPNP THEN DELETE;

Sadhna Smiley Happy

manojinpec
Obsidian | Level 7

What is the output you need. with the current record it will delete only the record in each cpnp group if the last in group is having plant='USM' and there are more than one record for that group.

Please let us know your input and output required to help you more'

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 10043 views
  • 0 likes
  • 6 in conversation