BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RTN
Calcite | Level 5 RTN
Calcite | Level 5

Hi

 

I am new at working with SAS 9.4. I am having problems deleting observations based on other observations.

I have people with an ID, year of study start, a score and the year the score was taken (score_year).

People with multiple scores have multiple observations.

I would like to keep all the observations where the score_year is after study start.

I would also like to keep one observation (if there is one) where score_year is before study start. When there are multiple observations with score_year before study start I only want to keep the one closest to study start (and still keep all the observations with score_year after study start). Can anyone help me with that?

 

 

data have;
input ID start score_year score;

1 2015 2003 2
1 2015 2007 0

2 2012 2004 2
2 2012 2005 0
2 2012 2013 1

3 2000 1997 0
3 2000 1999 6
3 2000 2002 4
3 2000 2005 6

4 2000 1998 0
4 2000 2008 1;

run;

data want;
input ID start score_year score;

1 2015 2007 0

2 2012 2005 0
2 2012 2013 1

3 2000 1999 6
3 2000 2002 4
3 2000 2005 6

4 2000 1998 0
4 2000 2008 1;

run;

 Thank you in advance.

Regards,

RTN

1 ACCEPTED SOLUTION

Accepted Solutions
rudfaden
Pyrite | Level 9

This should work.

 

data have;
input ID 2. start 4. score_year 5. score 2.;
cards;
1 2015 2003 2
1 2015 2007 0
2 2012 2004 2
2 2012 2005 0
2 2012 2013 1
3 2000 1997 0
3 2000 1999 6
3 2000 2002 4
3 2000 2005 6
4 2000 1998 0
4 2000 2008 1
;

run;

proc sql;
create table temp as
select *
from have 
where start >= score_year
order by id, start, score_year
;
create table keep1 as
select *
from have 
where start < score_year
;
quit;



data keep2;
set temp;
by id;
if not last.id then delete;
run;

data want;
set keep1 keep2;
run;

proc sort data=want;
by id start score_year;
run;

View solution in original post

7 REPLIES 7
rudfaden
Pyrite | Level 9

This should work.

 

data have;
input ID 2. start 4. score_year 5. score 2.;
cards;
1 2015 2003 2
1 2015 2007 0
2 2012 2004 2
2 2012 2005 0
2 2012 2013 1
3 2000 1997 0
3 2000 1999 6
3 2000 2002 4
3 2000 2005 6
4 2000 1998 0
4 2000 2008 1
;

run;

proc sql;
create table temp as
select *
from have 
where start >= score_year
order by id, start, score_year
;
create table keep1 as
select *
from have 
where start < score_year
;
quit;



data keep2;
set temp;
by id;
if not last.id then delete;
run;

data want;
set keep1 keep2;
run;

proc sort data=want;
by id start score_year;
run;
PeterClemmensen
Tourmaline | Level 20

Do like this

 

data have;
input ID start score_year score;
datalines;
1 2015 2003 2
1 2015 2007 0
2 2012 2004 2
2 2012 2005 0
2 2012 2013 1
3 2000 1997 0
3 2000 1999 6
3 2000 2002 4
3 2000 2005 6
4 2000 1998 0
4 2000 2008 1
run;

data want(drop=latest);
    do until (last.id);
        set have;
        by id;
        if score_year<start & score_year>latest then latest=score_year;
    end;
    do until (last.id);
        set have; 
        by id;
        if start<score_year | score_year=latest then output;
    end;
run;

 

Result:

 

ID  start  score_year  score
1   2015   2007        0
2   2012   2005        0
2   2012   2013        1
3   2000   1999        6
3   2000   2002        4
3   2000   2005        6
4   2000   1998        0
4   2000   2008        1

 

unison
Lapis Lazuli | Level 10

not run but will test later. The idea is to separate your concerns first (into before and after) then with the before set keep only the date closest to the score year. Stack this set with the after set and I think you will have what you want. 

 

data before after;
set have; if score_year ge start then output after; else output before; run; proc sort data=before; by ID start descending score_year; run; proc sort data=before nodupkey out=before2; by id start; run; data want; set after before2; run

-unison

-unison
novinosrin
Tourmaline | Level 20

SQL fun:

 


data have;
input ID start score_year score;
cards;
1 2015 2003 2
1 2015 2007 0
2 2012 2004 2
2 2012 2005 0
2 2012 2013 1
3 2000 1997 0
3 2000 1999 6
3 2000 2002 4
3 2000 2005 6
4 2000 1998 0
4 2000 2008 1
;
run;

proc sql;
create table want(drop=t:) as
select *,score_year>start as t, (start-score_year) as t1
from have
group by id,t
having t or min(t1)=t1 and not t 
order by id, start, score_year;
quit;
hashman
Ammonite | Level 13

@RTN:

This is a variation on the double DoW-loop scheme offered by @PeterClemmensen. The only essential difference is that the first DoW-loop reads only the records needed to determine the minimal difference in case WHERE score_year < start. 

data have ;                                                                                                                                                                                                                                                     
  input id start score_year score ;                                                                                                                                                                                                                             
  cards ;                                                                                                                                                                                                                                                       
1 2015 2003 2                                                                                                                                                                                                                                                   
1 2015 2007 0                                                                                                                                                                                                                                                   
2 2012 2004 2                                                                                                                                                                                                                                                   
2 2012 2005 0                                                                                                                                                                                                                                                   
2 2012 2013 1                                                                                                                                                                                                                                                   
3 2000 1997 0                                                                                                                                                                                                                                                   
3 2000 1999 6                                                                                                                                                                                                                                                   
3 2000 2002 4                                                                                                                                                                                                                                                   
3 2000 2005 6                                                                                                                                                                                                                                                   
4 2000 1998 0                                                                                                                                                                                                                                                   
4 2000 2008 1                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data want (drop = _:) ;                                                                                                                                                                                                                                         
  do until (last.id) ;                                                                                                                                                                                                                                          
    set have (where = (score_year < start)) ;                                                                                                                                                                                                                   
    by id ;                                                                                                                                                                                                                                                     
    _m = min (_m, start - score_year) ;                                                                                                                                                                                                                         
  end ;                                                                                                                                                                                                                                                         
  do until (last.id) ;                                                                                                                                                                                                                                          
    set have ;                                                                                                                                                                                                                                                  
    by id ;                                                                                                                                                                                                                                                     
    if start <= score_year or start - score_year = _m then output ;                                                                                                                                                                                             
  end ;                                                                                                                                                                                                                                                         
run ;  

An interesting aspect for those who like think in DoW terms is that though the two loops read different number of records from their respective independent input streams, they remain in sync thanks to the second BY.

 

Kind regards

Paul D.
    

Tom
Super User Tom
Super User

An interesting aspect for those who like think in DoW terms is that though the two loops read different number of records from their respective independent input streams, they remain in sync thanks to the second BY.

 

Unless there is a BY group with NO observations that meet the WHERE condition in the first DO loop.  If that happens you will start comparing apples to oranges.

hashman
Ammonite | Level 13

@Tom: A keen observation ... that's what I call think

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 784 views
  • 2 likes
  • 7 in conversation