Hi !
I've read a lot of subjects about loops and incrementation but I can't figure out how to do what seems like a simple task.
Here's what my data looks like :
data WORK.HAVE;
infile datalines;
input Name:$10. ID:2. YN:$8. value:$5.;
datalines;
Antoine 01 true (1)
Antoine 01 false (2)
Thomas 02 true (1)
Thomas 02 true (2)
Thomas 02 false (3)
Adeline 03 false (1)
Michelle 04 true (1)
Michelle 04 false (2)
Michelle 04 false (3)
Michelle 04 false (4)
Clemence 05 false (1)
;
For each individual, if YN was "true", a new row would be created. A "false" should have stoped the incrementation.
But following this logic, in this data Michelle has 2 rows that shouldn't exist.
My goal is to detect and delete those rows.
Expected result :
data WORK.WANT;
infile datalines;
input Name:$10. ID:2. YN:$8. value:$5.;
datalines;
Antoine 01 true (1)
Antoine 01 false (2)
Thomas 02 true (1)
Thomas 02 true (2)
Thomas 02 false (3)
Adeline 03 false (1)
Michelle 04 true (1)
Michelle 04 false (2)
Clemence 05 false (1)
;
It seems easy but I'm struggling and feel like i'm burning learning steps.
My logic was something like that :
IF actualrow.ID = previousrow.ID AND actualrow.YN = previousrow.YN = "false" THEN drop actualrow
Maybe I can use proc sql, but I don't know how to include sql within a data step yet..
I'm also having issues finding documents talking about this from basics.
All help appreciated.
Thanks for your time !
This works for your example data:
data work.want; set work.have; by notsorted name notsorted yn; retain flag; if first.name then flag=(yn='false'); else do; if flag and (yn='false') then delete; else flag = (yn='false'); end; run;
If name is ever different for the same ID you could use ID in place of Name in the code.
SAS Proc SQL does not have a "previous row" feature that works in the way you are thinking.
BTW if you code this with true/false or yes/no as numeric 1/0 values you don't need to use yn='false' for comparisons. You could use "if yn" to test for true or "if not(yn)" for false
data WORK.HAVE;
infile datalines;
input Name:$10. ID:2. YN:$8. value:$5.;
datalines;
Antoine 01 true (1)
Antoine 01 false (2)
Thomas 02 true (1)
Thomas 02 true (2)
Thomas 02 false (3)
Adeline 03 false (1)
Michelle 04 true (1)
Michelle 04 false (2)
Michelle 04 false (3)
Michelle 04 false (4)
Clemence 05 false (1)
;
data want;
set have;
by id yn notsorted;
retain f;
if first.id then f=.;
f=ifn(not first.id and lag(yn)='false',1,.);
if not f;
drop f ;
run;
/*or*/
data want;
set have;
by id yn notsorted;
retain f;
if first.id then f=.;
f=not first.id and lag(yn)='false';
if not f;
drop f ;
run;
This works for your example data:
data work.want; set work.have; by notsorted name notsorted yn; retain flag; if first.name then flag=(yn='false'); else do; if flag and (yn='false') then delete; else flag = (yn='false'); end; run;
If name is ever different for the same ID you could use ID in place of Name in the code.
SAS Proc SQL does not have a "previous row" feature that works in the way you are thinking.
BTW if you code this with true/false or yes/no as numeric 1/0 values you don't need to use yn='false' for comparisons. You could use "if yn" to test for true or "if not(yn)" for false
Hi Novinisrin and Ballardw !
Thanks for your time and help.
I've tested your scripts and they work fine in sandbox but not on live for a reason I can't find.
The closest result was with ballardw script, in witch the result was working fine but only on individuals that started with a "true" YN variable. It would drop the individuals starting with a "false" statement, except the very first individual SUBJID="3".
The script I executed and the output :
1998 data work.want;
1999 set work.have;
2000 by notsorted SUBJID notsorted CMYN;
2001 retain flag;
2002 if first.name then flag=(CMYN='false');
2003 else do;
2004 if flag and (CMYN='false') then delete;
2005 else flag = (CMYN='false');
2006 end;
2007 run;
NOTE: Variable first.name non initialisée.
NOTE: There were 28 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 5 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
It's weird because the script is working fine on sandbox data, whatever combinaison I try.
Thanks again for your time !
EDIT : I just realised first. function has "name" as parameter.. that was the problem.
Your scripts are working fine !
I'm still learning haha.
Thanks for your help !
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.