DATA Step, Macro, Functions and more

SAS selecting rows from historical dataset

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

SAS selecting rows from historical dataset

Hi,

 

I have following dataset.

data test;

input id date yymmdd8. X Y;

format date date9.;

datalines;

1 20160131 1 1

1 20160229 1 2

1 20160331 2 3

1 20160430 3 4

1 20160531 3 5

1 20160630 3 6

2 20160131 1 1

2 20160229 1 2

2 20160331 2 3

2 20160430 2 4

2 20160531 3 5

2 20160630 3 6

3 20160131 1 1

3 20160229 4 2

3 20160331 9 3

3 20160430 9 4

;

run;

 

iddateXY
131Jan201611
129Feb201612
131Mar201623
130Apr201634
131May201635
130Jun201636
231Jan201611
229Feb201612
231Mar201623
230Apr201624
231May201635
230Jun201636
331Jan201611
329Feb201642
331Mar201693
330Apr201694

 

From this I need to select the record for each ID, at the instance when (X > 2) for the first time.

 

So from the above dataset my output should be:

 

iddateXY
130Apr201634
231May201635
329Feb201642

 

 

I tried doing:

data test;

input id date yymmdd8. X Y;

format date date9.;

datalines;

1 20160131 1 1

1 20160229 1 2

1 20160331 2 3

1 20160430 3 4

1 20160531 3 5

1 20160630 3 6

2 20160131 1 1

2 20160229 1 2

2 20160331 2 3

2 20160430 2 4

2 20160531 3 5

2 20160630 3 6

3 20160131 1 1

3 20160229 4 2

3 20160331 9 3

3 20160430 9 4

;

run;

proc sort data=test out=test1;

by id descending x date;

run;

data test2;

set test1;

by id;

if first.id then output;

run;

 

But this would give me:

iddateXY
130Apr201634
231May201635
331Mar201693

Wrong entry for Id=3.

 

Any easy way to do this on a dataset with 60million records.

 

Thanks

 


Accepted Solutions
Solution
‎02-07-2017 05:53 PM
Super User
Posts: 5,497

Re: SAS selecting rows from historical dataset

[ Edited ]

I'm sure you will get to choose from a few possibilities.  Here's one that assumes your data set is sorted by ID DATE:

 

data want;

need_a_record='Y';

do until (last.id);

   set have;

   by id;

   if need_a_record='Y' and X > 2 then do;

      output;

      need_a_record='N';

   end;

end;

if need_a_record='Y' then output;

drop need_a_record;

run;

 

*** Statement added to handle the change in requirements.

View solution in original post


All Replies
Solution
‎02-07-2017 05:53 PM
Super User
Posts: 5,497

Re: SAS selecting rows from historical dataset

[ Edited ]

I'm sure you will get to choose from a few possibilities.  Here's one that assumes your data set is sorted by ID DATE:

 

data want;

need_a_record='Y';

do until (last.id);

   set have;

   by id;

   if need_a_record='Y' and X > 2 then do;

      output;

      need_a_record='N';

   end;

end;

if need_a_record='Y' then output;

drop need_a_record;

run;

 

*** Statement added to handle the change in requirements.

Contributor
Posts: 30

Re: SAS selecting rows from historical dataset

Posted in reply to Astounding
Super User
Posts: 11,343

Re: SAS selecting rows from historical dataset

If not sorted:

proc sort data=test ;
   by id date;
run;

data want;
set subset (where=(x >2)); by id date; if first.id; run;

If sorted by id and date then the datastep should suffice assuming you do not want any id if there is never a value greater than 2.

 

If you do want something when there are not any values for x > 2 then we need a rule.

Contributor
Posts: 30

Re: SAS selecting rows from historical dataset

@ballardw @Astounding Sorry as i think throught it, i shouldnt drops the ones that never went x>2. So for the ones which has X < 2, i would need to retain the most latest record. In the data below ID-4 has X=1 throughout, so I need to retain the last record for 4, which is 4 20160430 1 4.

data test;

input id date yymmdd8. X Y;

format date date9.;

datalines;

1 20160131 1 1

1 20160229 1 2

1 20160331 2 3

1 20160430 3 4

1 20160531 3 5

1 20160630 3 6

2 20160131 1 1

2 20160229 1 2

2 20160331 2 3

2 20160430 2 4

2 20160531 3 5

2 20160630 3 6

3 20160131 1 1

3 20160229 4 2

3 20160331 9 3

3 20160430 9 4

4 20160131 1 1

4 20160229 1 2

4 20160331 1 3

4 20160430 1 4

5 20160131 1 1

5 20160229 1 2

5 20160331 2 3

5 20160430 2 4

 

;

run;

 

thanks

Super User
Posts: 11,343

Re: SAS selecting rows from historical dataset

Another approach:

proc sort data=test;
   by id date;
run;
data want;
   set test;
   by id date;
   retain found;
   if first.id then  found=0;
   if found=0 and x>2 then do;
      output;
      found=1;
   end;
   if last.id and found=0 then output;
   drop found;
run;

Please post code in code boxes opened with the {i} icon. Sometimes we get artifacts from pasting in the main window as your code gets transformed to html in the main box of the message. For instance copy some of your code from the forum and paste back into your editor. See any difference?

 

Contributor
Posts: 30

Re: SAS selecting rows from historical dataset

data have;
input id date yymmdd8. X Y;
format date date9.;
datalines;
1 20160131 1 1
1 20160229 1 2
1 20160331 2 3
1 20160430 3 4
1 20160531 3 5
1 20160630 3 6
2 20160131 1 1
2 20160229 1 2
2 20160331 2 3
2 20160430 2 4
2 20160531 3 5
2 20160630 3 6
3 20160131 1 1
3 20160229 4 2
3 20160331 9 3
3 20160430 9 4
;
run;



data want;
set have;
prev_x = lag(X);
if prev_x > 2 and x > 2 then bad_now = 'N';
else do; if x>2 then bad_now='Y'; else bad_now='N'; end;
run;

data final;
set want;
where bad_now='Y';
run;

@ballardwthanks. I have thought of another approach Smiley Happy

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 210 views
  • 4 likes
  • 3 in conversation