03-06-2012 09:18 PM
I am new to SAS. so, Please let me know that the query " IF NOT FIRST.CPNP AND LAST.CPNP AND PLANT='USM' THEN DELETE;"
will delete duplicate rows from data sets or what??
Please reply me fast...
Thanks in Advance...
03-06-2012 09:53 PM
This will delete the last CPNP record if it is not the first record of the CPNP group and where plant=USM only.
It will not delete all duplicates if your CPNP group has more than two records.
03-07-2012 10:58 AM
This is the full data set code..
DATA BQDATA.VBELTQUOTES(DROP=ERROR) BQDATA.VBELTERRORS;
LENGTH ERROR $ 25;
SET BQDATA.VBELTQUOTES; BY PRODNO CPNP PLANT PARTNO COST_YEAR;
DROP T COST_YEAR CREATE_DATE UPDATE_DATE CLASS_EFFECTIVE_DATE QUOTE_EFFECTIVE_DATE;
IF NOT FIRST.CPNP AND LAST.CPNP AND PLANT='USM' THEN DELETE;
I thought this might help you to understand...
All I understood is that "BY PRODNO CPNP PLANT PARTNO COST_YEAR;" these 5 columns in By clause so for group of these 5 columns will have two set of values with same CPNP then it will delete second records..
In most cases I have CPNP value NULL..
If possible then can you reply me with example with random values that which records it will delete and which records will not..
Thanks Again in advance...
03-07-2012 11:18 AM
It still isn't clear which records you want to delete. You can also, rather than use delete, only keep the records that you want. Since you include all 5 variables in your by statement, are they all relevant?
If they are, and you don't want to keep any records that have plant='USM', and you only want the first record from sets of records that match on PRODNO CPNP PLANT PARTNO COST_YEAR, then you could use something like:
if PLANT ne 'USM' and first.cost_year;
03-07-2012 11:54 AM
Thanks for quick reply...
Actually I am converting this SAS code to SQL...
I have no idea about SAS and all I want to know is that what the hack is this line doing in the code...
I don't understand what and how it is deleting...
Thanks in advance..
03-07-2012 12:30 PM
In a way, the question becomes how much responsibility do you want to undertake.
Do you want to convert the code to SQL, even if the code is wrong?
Or do you want to understand what records are supposed to be deleted, and make the SQL version produce the correct result?
Either way, you can get answers here. But you might have to make your choice first.
It sure looks like the DELETE statement that you are working with is incorrect. It would operate as if there were parentheses in place:
IF (NOT FIRST.CPNP) and (LAST.CPNP) and (PLANT='USM') THEN DELETE;
We can tell you what the code does, but we can only guess at what the code is supposed to do.
03-07-2012 01:49 PM
Yes, I Need to understand what this code will do?
and I want to make same in SQL...
If I can know that which are the records will deleted using this code then i can make in SQL
03-07-2012 02:19 PM
OK, here's what it says to do. The key statements:
BY PRODNO CPNP PLANT PART_NO COST_YEAR;
IF NOT FIRST.CPNP and LAST.CPNP and PLANT='USM' THEN DELETE;
Assuming that these are the statements in the original code, it means:
If a block of records (must be more than one record in the block):
share the same PRODNO and CPNP, and
the last record in the block is USM,
then delete that last record of the block, keeping all the earlier records in the block.
You can see why many of us doubt that this is the right thing to do, but that's what the code is saying.
03-07-2012 02:31 PM
Should also note the definition of the "last record in the block".
Given the PRODNO and CPNP, the last record in the block is the one with the
largest value of PLANT,
and within the largest value of PLANT, the largest value of PART_NO,
and within the largest value of PART_NO, the largest value of COST_YEAR.
I still think the original code is wrong, but at least you know what it means.
03-07-2012 03:21 PM
While you may be able to write something in sql that mimics the same behavior, possibly using the undocumented monotonic function, there is no guarantee that it will be accomplishing the same thing as the datastep code.
All flavors of SQL come with the caveat that the concept of record, let alone record order, is irrelevant. The use of first. and last., on the other hand, are totally dependent upon record order.
03-07-2012 03:50 AM
Yes, Barry is absolutly right.
It will not delete all duplicates.
This will delete only the last record of each CPNP group if it is not first and also where plant=USM.
If you wants to delete all duplicates and out of all duplicates you want to keep only the first record where plant=USM then you can go for the code given below:-
IF NOT FIRST.CPNP THEN DELETE;
03-07-2012 03:54 AM
What is the output you need. with the current record it will delete only the record in each cpnp group if the last in group is having plant='USM' and there are more than one record for that group.
Please let us know your input and output required to help you more'