I've come across a strange situation where I've to delete a record if there is any other record having the matching key variables. Any help would be highly appreciated. My requirement is to drop both these records from further processing. My sample data looks like below where the fields Vendor, Job and Grade are the key variables.
Vendor | Job | Grade | Payment | |
1 | ABC | Boxes | 01 | $3,500.00 |
2 | XYZ | Tape | 01 | $2,000.00 |
3 | LMN | Straps | 02 | $1,500.00 |
4 | ABC | Pins | 02 | $1,200.00 |
5 | LMN | Straps | 02 | $0.00 |
6 | ABC | Boxes | 01 | $3,500.00 |
7 | LMN | Straps | 02 | $0.00 |
8 | ABC | Boxes | 01 | $0.00 |
A record has to be dropped from further processing if there is another record anywhere in the table having the amount field(Payment) a value $0.
Considering this the record #1 for vendor ABC having Payment $3500 should be deleted because there is another record #8 where Payment is $0, but #6 having Payment $3500 should be processed because #8 is no more available as it is already matched(/deleted along) with #1. Similarly #5 for vendor LMN should also be dropped as there is a matching record at #7 having payment $0.
data have; infile datalines expandtabs; input vendor $ job $ grade payment; datalines; ABC Boxes 01 3500.00 XYZ Tape 01 2000.00 LMN Straps 02 1500.00 ABC Pins 02 1200.00 LMN Straps 02 0.00 ABC Boxes 01 3500.00 LMN Straps 02 0.00 ABC Boxes 01 0.00 ; run; proc sort data=have ;by vendor job grade;run; data want; ina=0;inb=0; merge have(where=(payment ne 0) in=ina) have(rename=(payment=_payment) where=(_payment eq 0) in=inb); by vendor job grade; if not inb; run;
I would caution the approach, as you may not want to delete all the records associated, but going from the data you posted - and I would also point out that its a good idea to post test data in the form of a datastep as illustrated:
data have; input vendor $ job $ grade payment; datalines; ABC Boxes 01 3500.00 XYZ Tape 01 2000.00 LMN Straps 02 1500.00 ABC Pins 02 1200.00 LMN Straps 02 0.00 ABC Boxes 01 3500.00 LMN Straps 02 0.00 ABC Boxes 01 0.00 ; run; proc sql undopolicy=none; delete from HAVE A where exists(select distinct VENDOR from HAVE where VENDOR=A.VENDOR and JOB=A.JOB and PAYMENT=0); quit;
Thanks RW9, Sorry for not mentioning my approach trials there but the solution provided by Ksharp is really sharp and helped in what I needed.
data have; infile datalines expandtabs; input vendor $ job $ grade payment; datalines; ABC Boxes 01 3500.00 XYZ Tape 01 2000.00 LMN Straps 02 1500.00 ABC Pins 02 1200.00 LMN Straps 02 0.00 ABC Boxes 01 3500.00 LMN Straps 02 0.00 ABC Boxes 01 0.00 ; run; proc sort data=have ;by vendor job grade;run; data want; ina=0;inb=0; merge have(where=(payment ne 0) in=ina) have(rename=(payment=_payment) where=(_payment eq 0) in=inb); by vendor job grade; if not inb; run;
The In= dataset option creates a temporary variable whose value is not saved to the output dataset unless you explicitly assign it to another variable. There are several options on a SET or INFILE statement that do this also such as END NOBS POINT EOV FILENAME FILEVAR LENGTH LINE
Yepp that is true but my question was why did Ksharp initialise those variables with 0's. Is the default assignment different from this? I didnt understand how did the behavior of this step changed completely with just inclusion of that "Initialisisng step".
Oh! It makes sense now. Thanks a lot! And now I can throw my 100 lines of code away into trash and just use this master piece!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.