BookmarkSubscribeRSS Feed
SASdevAnneMarie
Barite | Level 11

Hello Experts,

 

I'm using the group by statement in proc sql by NO Code_2 and DATE.

By this key I would like to delete the CD that have only "X" difference (XTIN et TIN), beacause it is the same events.

 

Do you know some option to count the value as the same if only one letter is different ? 

 

The file is joined. 

 

SASdevAnneMarie_0-1710108892155.png

 

Thank you !

10 REPLIES 10
PaigeMiller
Diamond | Level 26

Are the records in your data set always in pairs, with value without an X (for example TIN) coming before the value with the X (for example, XTIN)? Could there ever be a group of 3 records?

--
Paige Miller
SASdevAnneMarie
Barite | Level 11
Thank you for your message. Just 2 : X means cancelled event.
mkeintz
PROC Star

@SASdevAnneMarie wrote:
Thank you for your message. Just 2 : X means cancelled event.

So does this means (a) there are no instances of more than 2 records for a given NO/CODE_2/DATE combination, and (b) ALL instances of exactly two records for a given combination represent instances of an event and its cancellation?

 

If so, then all you really want to do is keep only the singletons, yes?  That's what this code does.  And you would not need to compare character values (assuming data are sorted by NO/CODE_2/DATE):

 

data want;
  set have;
  by no code_2 date;
  if first.date=1 and last.date=1;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Why not just delete the observations where CD starts with X?

 

Easy with an IF statement.

data want;
  set have;
  if cd =: 'X' then delete;
run;

Harder with a WHERE statement (or in SQL code) since you cannot use the : modifier to do trimmed comparisons.

where char(cd,1) ne 'X'
SASdevAnneMarie
Barite | Level 11
Thank you, Tom, but I would like to delete the double line (with XMIN et MIN) beacause it is cancelled event (if on the same date).
Patrick
Opal | Level 21

It was at least for me not clear from your description that you want to delete ALL the highlighted rows. Below should do this.

data have;
  infile datalines dlm=',' dsd truncover;
  input NO $ Code_2 $ DATE:ddmmyy10. CD$;
  format date date9.;
  some_other_var=_n_;
  datalines;
B17,214,18/07/2019,TIN
B17,214,6/08/2019,TIN
B17,214,6/08/2019,XTIN
B17,214,10/12/2019,TINK
B17,214,10/12/2019,XTINK
MK2,300,18/10/2019,TIN
MK2,300,19/10/2019,MIN
MK2,300,19/10/2019,XMIN
MK2,300,21/10/2019,PAP
MK2,300,22/10/2019,PAK
;

proc sql;
/*  create table want_1 as*/
  select l.*
  from have l 
  left join (select no,code_2,date,substr(cd,2) as _cd from have where upcase(substr(cd,1,1))='X') r
  on l.no=r.no and l.code_2=r.code_2 and l.date=r.date
  where missing(r._cd)
  ;
quit;
Patrick
Opal | Level 21

To just delete any cancelled event a where clause would do. If you want to keep cancelled events without a matching event on the same date then code as below should work.

The last row in below data contains the case where there is a cancelled event (XTINK) without a matching event.

data have;
  infile datalines dlm=',' dsd truncover;
  input NO $ Code_2 $ DATE:ddmmyy10. CD$;
  format date date9.;
  some_other_var=_n_;
  datalines;
B17,214,18/07/2019,TIN
B17,214,6/08/2019,TIN
B17,214,6/08/2019,XTIN
B17,214,10/12/2019,TINK
B17,214,10/12/2019,XTINK
MK2,300,18/10/2019,TIN
MK2,300,19/10/2019,MIN
MK2,300,19/10/2019,XMIN
MK2,300,21/10/2019,PAP
MK2,300,22/10/2019,PAK
MK2,300,01/12/2019,XTINK
;

/* option 1 */
proc sql;
  create table want_1 as
  select *, count(*) as cnt
  from have
  group by no,code_2,date
  having cnt=1 or cnt>1 and upcase(substr(cd,1,1)) ne 'X'
  ;
quit;

/* option 2 */
proc sort data=have out=inter;
  by no Code_2 date;
run;
data want_2;
  set inter;
  by no Code_2 date;
  if not first.date and upcase(substr(cd,1,1)) = 'X' then delete;
run;

 

Mazi
Pyrite | Level 9


Can you try this?

proc sort data = have;
    by no code_2 date cd;
run;

data want;
    do until(last.date);
      set have;
      by no code_2 date;
     If cd =: 'X' then flag=1;
   end;
   do until(last.date);
     set have;
     by no code_2 date;
     If flag then continue;
     output;
  end;
  flag=0;
run;
ballardw
Super User

I suspect that code may have an issue because of the curly quotes around the X in the line starting "If cd =: " . Look closely, they are not the simple double quote used in SAS for character values: "

 


@Mazi wrote:


Can you try this?

proc sort data = have;
    by no code_2 date cd;
run;

data want;
    do until(last.date);
      set have;
      by no code_2 date;
     If cd =: “X” then flag=1;
   end;
   do until(last.date);
     set have;
     by no code_2 date;
     If flag then continue;
     output;
  end;
  flag=0;
run;

 

Mazi
Pyrite | Level 9

@ballardw ,

 

Yes, you are correct. Thank you for pointing that out. As is, this will surely cause issues.

 

Let me see if I can still amend it.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1672 views
  • 2 likes
  • 7 in conversation