BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Althaea
Fluorite | Level 6

Hi !

 

I've read a lot of subjects about loops and incrementation but I can't figure out how to do what seems like a simple task.

Here's what my data looks like :

 

data WORK.HAVE;
  infile datalines;
  input Name:$10. ID:2. YN:$8. value:$5.;
datalines;
 Antoine 01 true (1)
 Antoine 01 false (2)
 Thomas 02 true (1)
 Thomas 02 true (2)
 Thomas 02 false (3)
 Adeline 03 false (1)
 Michelle 04 true (1)
 Michelle 04 false (2)
 Michelle 04 false (3)
 Michelle 04 false (4)
 Clemence 05 false (1)
;

For each individual, if YN was "true", a new row would be created. A "false" should have stoped the incrementation.

But following this logic, in this data Michelle has 2 rows that shouldn't exist.

My goal is to detect and delete those rows.

 

Expected result :

data WORK.WANT;
  infile datalines;
  input Name:$10. ID:2. YN:$8. value:$5.;
datalines;
 Antoine 01 true (1)
 Antoine 01 false (2)
 Thomas 02 true (1)
 Thomas 02 true (2)
 Thomas 02 false (3)
 Adeline 03 false (1)
 Michelle 04 true (1)
 Michelle 04 false (2)
 Clemence 05 false (1)
;

It seems easy but I'm struggling and feel like i'm burning learning steps.

 

My logic was something like that :

IF actualrow.ID = previousrow.ID AND actualrow.YN = previousrow.YN = "false" THEN drop actualrow

Maybe I can use proc sql, but I don't know how to include sql within a data step yet..

I'm also having issues finding documents talking about this from basics.

 

All help appreciated.

Thanks for your time !

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

This works for your example data:

data work.want;
   set work.have;
   by notsorted name notsorted yn;
   retain flag;
   if first.name then flag=(yn='false');
   else do;
      if flag and (yn='false') then delete;
      else flag = (yn='false');
   end;
run;

If name is ever different for the same ID you could use ID in place of Name in the code.

 

SAS Proc SQL does not have a "previous row" feature that works in the way you are thinking.

 

BTW if you code this with true/false or yes/no as numeric 1/0 values you don't need to use yn='false' for comparisons. You could use "if yn" to test for true or "if not(yn)" for false

View solution in original post

3 REPLIES 3
novinosrin
Tourmaline | Level 20
data WORK.HAVE;
  infile datalines;
  input Name:$10. ID:2. YN:$8. value:$5.;
datalines;
 Antoine 01 true (1)
 Antoine 01 false (2)
 Thomas 02 true (1)
 Thomas 02 true (2)
 Thomas 02 false (3)
 Adeline 03 false (1)
 Michelle 04 true (1)
 Michelle 04 false (2)
 Michelle 04 false (3)
 Michelle 04 false (4)
 Clemence 05 false (1)
;




data want;
set have;
by id yn notsorted;
retain f;
if first.id then f=.;
f=ifn(not first.id and lag(yn)='false',1,.);
if 	not f;
drop f ;
run;

/*or*/

 

data want;
set have;
by id yn notsorted;
retain f;
if first.id then f=.;
f=not first.id and lag(yn)='false';
if 	not f;
drop f ;
run;

 

ballardw
Super User

This works for your example data:

data work.want;
   set work.have;
   by notsorted name notsorted yn;
   retain flag;
   if first.name then flag=(yn='false');
   else do;
      if flag and (yn='false') then delete;
      else flag = (yn='false');
   end;
run;

If name is ever different for the same ID you could use ID in place of Name in the code.

 

SAS Proc SQL does not have a "previous row" feature that works in the way you are thinking.

 

BTW if you code this with true/false or yes/no as numeric 1/0 values you don't need to use yn='false' for comparisons. You could use "if yn" to test for true or "if not(yn)" for false

Althaea
Fluorite | Level 6

Hi Novinisrin and Ballardw !

Thanks for your time and help.

 

I've tested your scripts and they work fine in sandbox but not on live for a reason I can't find.

 

The closest result was with ballardw script, in witch the result was working fine but only on individuals that started with a "true" YN variable. It would drop the individuals starting with a "false" statement, except the very first individual SUBJID="3".

 

The script I executed and the output :

 

1998  data work.want;
1999     set work.have;
2000     by notsorted SUBJID notsorted CMYN;
2001     retain flag;
2002     if first.name then flag=(CMYN='false');
2003     else do;
2004        if flag and (CMYN='false') then delete;
2005        else flag = (CMYN='false');
2006     end;
2007  run;

NOTE: Variable first.name non initialisée.
NOTE: There were 28 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 5 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

 

 

It's weird because the script is working fine on sandbox data, whatever combinaison I try.

 

Thanks again for your time !

 

EDIT : I just realised first. function has "name" as parameter.. that was the problem.

Your scripts are working fine !

 

I'm still learning haha.

 

Thanks for your help !

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 774 views
  • 1 like
  • 3 in conversation