BookmarkSubscribeRSS Feed
yeaforme
Calcite | Level 5

Hoping to receive help with the following problem; data set immediately below followed by problem.

 

 

The actual data set is much more complex, but here's simplified data:

data test;

input id year date period span;

datalines;

1 2000 9 1 40

1 2000 9 2 375

1 2000 10 1 10

1 2000 10 2 355 

 

1 2001 1 1 -15

1 2001 1 2 290

1 2001 9 1 39

1 2001 9 2 320

 

2 2000 11 1 43

2 2000 11 2 350

2 2000 12 1 25

2 2000 12 2 310

2 2000 1 1 -40

2 2000 1 2 280

;

run;

 

For each id-year group (the above data set is spaced based upon the id-year groupings), I need to keep a single id-year-date group (i.e., a set of period 1-2 pairs) where the observation from period 1 (of the period 1-2 pair) has a value of span which meets two criterion: 1) span is greater than 0, and 2) the value of span is closest to 0 of all other period 1 observations for the id-year group.  Note, I do not want the period 2 observation that meets those 2 criterion, rather I want to keep period 2 based upon its period 1 counterpart that best meets those criterion.

 

Therefore, the final data set should look like this:

 

1 2000 9 1 40 (eliminated because 40 is farther away from 0 than 10)

1 2000 9 2 375 (eliminated because its period 1 pair was eliminated due to its paired period 1's span being at a greater distance)

1 2000 10 1 10

1 2000 10 2 355 

 

1 2001 1 1 -15 (eliminated because the value of span is negative: -15)

1 2001 1 2 290 (eliminated because its period 1 pair was eliminated due to a negative span)

1 2001 9 1 39

1 2001 9 2 320

 

2 2000 11 1 43 (eliminated because 43 is farther away from 0 than 25)

2 2000 11 2 350 (eliminated because its period 1 pair was eliminated due to its paired period 1's span being at a greater distance)

2 2000 12 1 25

2 2000 12 2 310

2 2000 1 1 -40 (eliminated because the value of span was negative: -40)

2 2000 1 2 280 (eliminated because its period 1 pair was eliminated due to a negative span)

 

The actual dataset has more than just periods 1 and 2 and can contain any number of period groupings per id-year.

 

An acceptable solution could possibly copy the value of span from period 1 of each id-year-date group to the other periods within the group and then do a two-step process of first, eliminating all observations with span less than zero, and then second, sorting on id-year-period-span in descending order and then retain only the first observation of each id-year-period group...?

11 REPLIES 11
novinosrin
Tourmaline | Level 20

is this

2 2000 12 1 25

2 2000 12 2 310

correct in your expected output

or should it be 

B 2000 1 2 280 ?

 

data test;

input id $ year period span;

datalines;
A 2000 1 40
A 2000 2 375
A 2000 1 10
A 2000 2 355 
A 2001 1 -15
A 2001 2 290
A 2001 1 39
A 2001 2 320
B 2000 1 43
B 2000 2 350
B 2000 1 25
B 2000 2 310
B 2000 1 -40
B 2000 2 280
;

run;

proc sql;
create table want as
select *
from test
group by id,year,period
having min(case  when span<0 then . else span    end)=span
order by id, year, period;
quit;
yeaforme
Calcite | Level 5
The 2 in ID should be B, but the correct grouping for ID B should be the 1-2 pair with values 25-310. I added an additional date variable to the original data to hopefully make the span sets more obvious.
yeaforme
Calcite | Level 5
Please see updated inquiry.
ballardw
Super User

Please test your example data datastep before posting it. ID is missing in your example as you do not use: input id $ so id is treated as numeric;

If ID or date has any role in the result, which I think it does, then you should show it.

 

I think your "want" example is incorrect: You show

1 2001 9 2 320 

but there is a row of data

A 2001 1 2 290

which is the same year and period but 290 is closest to 0.

 

proc sort data=test;
   by id year period  span;
run;
data want;
   set test;
   where span>0;
   by id year period ;
   if first.period;
run;
yeaforme
Calcite | Level 5

I apologize for my test data, I posted it and then continually made revisions trying to make it more clear, but just caused everyone more problems.

That being said, my want data is not incorrect. Each period 1-2 pair comes as a pair based on the date value. It is imperative that the period 1 value in span of each 1-2 pair not be negative, therefore, A 2001 1 2 290 is disqualified because its period 1 pair based on date is A 2001 1 1 -15.

novinosrin
Tourmaline | Level 20
data test;

input id $ year period span;

datalines;
A 2000 1 40
A 2000 2 375
A 2000 1 10
A 2000 2 355 
A 2001 1 -15
A 2001 2 290
A 2001 1 39
A 2001 2 320
B 2000 1 43
B 2000 2 350
B 2000 1 25
B 2000 2 310
B 2000 1 -40
B 2000 2 280
;

run;
data grp;
set test;
by id  year period;
if period =1 then 
grp+1;
run;
data temp;
f=1;
do until(last.grp);
set grp;
by  grp notsorted;
if period =1 and span<0 then f=.;
if not missing(f) then output;
end;
drop grp;
run;

proc sql;
create table want as
select *
from temp
group by id, year,period
having min(span)=span
order by id, year, period;
quit;
ballardw
Super User

@yeaforme wrote:

I apologize for my test data, I posted it and then continually made revisions trying to make it more clear, but just caused everyone more problems.

That being said, my want data is not incorrect. Each period 1-2 pair comes as a pair based on the date value. It is imperative that the period 1 value in span of each 1-2 pair not be negative, therefore, A 2001 1 2 290 is disqualified because its period 1 pair based on date is A 2001 1 1 -15.


Since you have changed the data at least twice and the requirements as well since I started to answer this request I have no further response.

novinosrin
Tourmaline | Level 20

@ballardw Oh my gosh, i didn't notice that

yeaforme
Calcite | Level 5

The requirements have not changed, though how detailed your understanding of it may have.

novinosrin
Tourmaline | Level 20
data test;

input id year date period span;

datalines;
1 2000 9 1 40
1 2000 9 2 375
1 2000 10 1 10
1 2000 10 2 355 
1 2001 1 1 -15
1 2001 1 2 290
1 2001 9 1 39
1 2001 9 2 320
2 2000 11 1 43
2 2000 11 2 350
2 2000 12 1 25
2 2000 12 2 310
2 2000 1 1 -40
2 2000 1 2 280
;

run;

data grp;
set test;
by id  year period notsorted;
if period =1 then 
grp+1;
run;
data temp;
f=1;
do until(last.grp);
set grp;
by  grp notsorted;
if period =1 and span<0 then f=.;
if not missing(f) then output;
end;
drop grp f;
run;

proc sql;
create table want as
select *
from temp
group by id, year,period
having min(span)=span
order by id, year, period;
quit;
yeaforme
Calcite | Level 5

I've figured out a solution.  It involves: 1) using the retain statement to first spread the period 1's span value to the rest of group, 2) sort based on the retained span value, 3) eliminate group with negative retained span, and then 4) retain only the first group based on retained span.  It may not be the most elegant, but fits the bill.  Thanks to all for trying.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1977 views
  • 0 likes
  • 3 in conversation