Hello Experts,
I'm using the group by statement in proc sql by NO Code_2 and DATE.
By this key I would like to delete the CD that have only "X" difference (XTIN et TIN), beacause it is the same events.
Do you know some option to count the value as the same if only one letter is different ?
The file is joined.
Thank you !
Are the records in your data set always in pairs, with value without an X (for example TIN) coming before the value with the X (for example, XTIN)? Could there ever be a group of 3 records?
@SASdevAnneMarie wrote:
Thank you for your message. Just 2 : X means cancelled event.
So does this means (a) there are no instances of more than 2 records for a given NO/CODE_2/DATE combination, and (b) ALL instances of exactly two records for a given combination represent instances of an event and its cancellation?
If so, then all you really want to do is keep only the singletons, yes? That's what this code does. And you would not need to compare character values (assuming data are sorted by NO/CODE_2/DATE):
data want;
set have;
by no code_2 date;
if first.date=1 and last.date=1;
run;
Why not just delete the observations where CD starts with X?
Easy with an IF statement.
data want;
set have;
if cd =: 'X' then delete;
run;
Harder with a WHERE statement (or in SQL code) since you cannot use the : modifier to do trimmed comparisons.
where char(cd,1) ne 'X'
It was at least for me not clear from your description that you want to delete ALL the highlighted rows. Below should do this.
data have;
infile datalines dlm=',' dsd truncover;
input NO $ Code_2 $ DATE:ddmmyy10. CD$;
format date date9.;
some_other_var=_n_;
datalines;
B17,214,18/07/2019,TIN
B17,214,6/08/2019,TIN
B17,214,6/08/2019,XTIN
B17,214,10/12/2019,TINK
B17,214,10/12/2019,XTINK
MK2,300,18/10/2019,TIN
MK2,300,19/10/2019,MIN
MK2,300,19/10/2019,XMIN
MK2,300,21/10/2019,PAP
MK2,300,22/10/2019,PAK
;
proc sql;
/* create table want_1 as*/
select l.*
from have l
left join (select no,code_2,date,substr(cd,2) as _cd from have where upcase(substr(cd,1,1))='X') r
on l.no=r.no and l.code_2=r.code_2 and l.date=r.date
where missing(r._cd)
;
quit;
To just delete any cancelled event a where clause would do. If you want to keep cancelled events without a matching event on the same date then code as below should work.
The last row in below data contains the case where there is a cancelled event (XTINK) without a matching event.
data have;
infile datalines dlm=',' dsd truncover;
input NO $ Code_2 $ DATE:ddmmyy10. CD$;
format date date9.;
some_other_var=_n_;
datalines;
B17,214,18/07/2019,TIN
B17,214,6/08/2019,TIN
B17,214,6/08/2019,XTIN
B17,214,10/12/2019,TINK
B17,214,10/12/2019,XTINK
MK2,300,18/10/2019,TIN
MK2,300,19/10/2019,MIN
MK2,300,19/10/2019,XMIN
MK2,300,21/10/2019,PAP
MK2,300,22/10/2019,PAK
MK2,300,01/12/2019,XTINK
;
/* option 1 */
proc sql;
create table want_1 as
select *, count(*) as cnt
from have
group by no,code_2,date
having cnt=1 or cnt>1 and upcase(substr(cd,1,1)) ne 'X'
;
quit;
/* option 2 */
proc sort data=have out=inter;
by no Code_2 date;
run;
data want_2;
set inter;
by no Code_2 date;
if not first.date and upcase(substr(cd,1,1)) = 'X' then delete;
run;
Can you try this?
proc sort data = have;
by no code_2 date cd;
run;
data want;
do until(last.date);
set have;
by no code_2 date;
If cd =: 'X' then flag=1;
end;
do until(last.date);
set have;
by no code_2 date;
If flag then continue;
output;
end;
flag=0;
run;
I suspect that code may have an issue because of the curly quotes around the X in the line starting "If cd =: " . Look closely, they are not the simple double quote used in SAS for character values: "
@Mazi wrote:
Can you try this?proc sort data = have; by no code_2 date cd; run; data want; do until(last.date); set have; by no code_2 date; If cd =: “X” then flag=1; end; do until(last.date); set have; by no code_2 date; If flag then continue; output; end; flag=0; run;
Yes, you are correct. Thank you for pointing that out. As is, this will surely cause issues.
Let me see if I can still amend it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.