BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sivakoya
Obsidian | Level 7

Hi,

 

I have following dataset.

data test;

input id date yymmdd8. X Y;

format date date9.;

datalines;

1 20160131 1 1

1 20160229 1 2

1 20160331 2 3

1 20160430 3 4

1 20160531 3 5

1 20160630 3 6

2 20160131 1 1

2 20160229 1 2

2 20160331 2 3

2 20160430 2 4

2 20160531 3 5

2 20160630 3 6

3 20160131 1 1

3 20160229 4 2

3 20160331 9 3

3 20160430 9 4

;

run;

 

iddateXY
131Jan201611
129Feb201612
131Mar201623
130Apr201634
131May201635
130Jun201636
231Jan201611
229Feb201612
231Mar201623
230Apr201624
231May201635
230Jun201636
331Jan201611
329Feb201642
331Mar201693
330Apr201694

 

From this I need to select the record for each ID, at the instance when (X > 2) for the first time.

 

So from the above dataset my output should be:

 

iddateXY
130Apr201634
231May201635
329Feb201642

 

 

I tried doing:

data test;

input id date yymmdd8. X Y;

format date date9.;

datalines;

1 20160131 1 1

1 20160229 1 2

1 20160331 2 3

1 20160430 3 4

1 20160531 3 5

1 20160630 3 6

2 20160131 1 1

2 20160229 1 2

2 20160331 2 3

2 20160430 2 4

2 20160531 3 5

2 20160630 3 6

3 20160131 1 1

3 20160229 4 2

3 20160331 9 3

3 20160430 9 4

;

run;

proc sort data=test out=test1;

by id descending x date;

run;

data test2;

set test1;

by id;

if first.id then output;

run;

 

But this would give me:

iddateXY
130Apr201634
231May201635
331Mar201693

Wrong entry for Id=3.

 

Any easy way to do this on a dataset with 60million records.

 

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

I'm sure you will get to choose from a few possibilities.  Here's one that assumes your data set is sorted by ID DATE:

 

data want;

need_a_record='Y';

do until (last.id);

   set have;

   by id;

   if need_a_record='Y' and X > 2 then do;

      output;

      need_a_record='N';

   end;

end;

if need_a_record='Y' then output;

drop need_a_record;

run;

 

*** Statement added to handle the change in requirements.

View solution in original post

6 REPLIES 6
Astounding
PROC Star

I'm sure you will get to choose from a few possibilities.  Here's one that assumes your data set is sorted by ID DATE:

 

data want;

need_a_record='Y';

do until (last.id);

   set have;

   by id;

   if need_a_record='Y' and X > 2 then do;

      output;

      need_a_record='N';

   end;

end;

if need_a_record='Y' then output;

drop need_a_record;

run;

 

*** Statement added to handle the change in requirements.

ballardw
Super User

If not sorted:

proc sort data=test ;
   by id date;
run;

data want;
set subset (where=(x >2)); by id date; if first.id; run;

If sorted by id and date then the datastep should suffice assuming you do not want any id if there is never a value greater than 2.

 

If you do want something when there are not any values for x > 2 then we need a rule.

sivakoya
Obsidian | Level 7

@ballardw @Astounding Sorry as i think throught it, i shouldnt drops the ones that never went x>2. So for the ones which has X < 2, i would need to retain the most latest record. In the data below ID-4 has X=1 throughout, so I need to retain the last record for 4, which is 4 20160430 1 4.

data test;

input id date yymmdd8. X Y;

format date date9.;

datalines;

1 20160131 1 1

1 20160229 1 2

1 20160331 2 3

1 20160430 3 4

1 20160531 3 5

1 20160630 3 6

2 20160131 1 1

2 20160229 1 2

2 20160331 2 3

2 20160430 2 4

2 20160531 3 5

2 20160630 3 6

3 20160131 1 1

3 20160229 4 2

3 20160331 9 3

3 20160430 9 4

4 20160131 1 1

4 20160229 1 2

4 20160331 1 3

4 20160430 1 4

5 20160131 1 1

5 20160229 1 2

5 20160331 2 3

5 20160430 2 4

 

;

run;

 

thanks

ballardw
Super User

Another approach:

proc sort data=test;
   by id date;
run;
data want;
   set test;
   by id date;
   retain found;
   if first.id then  found=0;
   if found=0 and x>2 then do;
      output;
      found=1;
   end;
   if last.id and found=0 then output;
   drop found;
run;

Please post code in code boxes opened with the {i} icon. Sometimes we get artifacts from pasting in the main window as your code gets transformed to html in the main box of the message. For instance copy some of your code from the forum and paste back into your editor. See any difference?

 

sivakoya
Obsidian | Level 7
data have;
input id date yymmdd8. X Y;
format date date9.;
datalines;
1 20160131 1 1
1 20160229 1 2
1 20160331 2 3
1 20160430 3 4
1 20160531 3 5
1 20160630 3 6
2 20160131 1 1
2 20160229 1 2
2 20160331 2 3
2 20160430 2 4
2 20160531 3 5
2 20160630 3 6
3 20160131 1 1
3 20160229 4 2
3 20160331 9 3
3 20160430 9 4
;
run;



data want;
set have;
prev_x = lag(X);
if prev_x > 2 and x > 2 then bad_now = 'N';
else do; if x>2 then bad_now='Y'; else bad_now='N'; end;
run;

data final;
set want;
where bad_now='Y';
run;

@ballardwthanks. I have thought of another approach 🙂

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1252 views
  • 4 likes
  • 3 in conversation