Dear All,
I have the following cases:
var1 var2 var3 var4 var5
12040 10-K 33746 26098 29Dec2010
12040 10-K 32298 26098 29Dec2010
12040 10-K 33746 25206 04Jan2010
12040 10-K 32298 25206 04Jan2010
8504 10KSB 15831 11617 23Aug2006
8504 10KSB 14250 11617 23Aug2006
8504 10KSB 12775 11617 23Aug2006
8504 10KSB 12142 11617 23Aug2006
8504 10KSB 15831 10515 05Sep2006
8504 10KSB 14250 10515 05Sep2006
8504 10KSB 12775 10515 05Sep2006
8504 10KSB 12142 10515 05Sep2006
8504 10KSB 15831 10105 07Sep2006
8504 10KSB 14250 10105 07Sep2006
8504 10KSB 12775 10105 07Sep2006
8504 10KSB 12142 10105 07Sep2006
8328 10-K 60811 49348 28Jul2009
8328 10-K 39154 49348 28Jul2009
8328 10-K 60811 30424 15Jul2009
8328 10-K 39154 30424 15Jul2009
I hope these three cases are enough to give you the general picture of my problem. Now, I want to get the following dataset:
var1 var2 var3 var4 var5
12040 10-K 33746 26098 29Dec2010
12040 10-K 32298 25206 04Jan2010
8504 10KSB 15831 11617 23Aug2006
8504 10KSB 14250 10515 05Sep2006
8504 10KSB 12775 10105 07Sep2006
8328 10-K 60811 49348 28Jul2009
8328 10-K 39154 30424 15Jul2009
So, consider duplicates in the first case. I have two duplicates by var4/var5. So, I need to go to the first duplicate and select the row where the biggest value of var4 corresponds to the biggest value of var3 (i.e. 26098 is assigned to 33746), so I take the first row and drop the second row. Then, I go to the second duplicate and select the row where the SECOND biggest value of var4 corresponds to the SECOND biggest value of var3 (i.e. 25206 is assigned to 32298). Then, if needed, I go to the third duplicate case and select the row where the THIRD biggest value of var4 corresponds to the THIRD biggest value of var3. And so on.
The same principle is applied to the second and third cases mentioned above. I really hope my explanation is clear.
Your help would be very highly appreciated. I look forward to getting your reply.
Kind regards,
Ruslan
@Ruslan Good morning, see if the modified version with your modified sample below works. I haven't had a coffee yet and so please test thoroughly. I will get my head around it in a bit after some caffeine.
data have;
input (var1 var2 var3 var4 var5) (:$10.);
cards;
12040 10-K 56376 47611 28Dec2007
12040 10-K 33746 26098 29Dec2010
12040 10-K 32298 26098 29Dec2010
12040 10-K 33247 25722 27Dec2013
12040 10-K 33746 25206 04Jan2010
12040 10-K 32298 25206 04Jan2010
8504 10KSB 20618 16895 16Jul2001
8504 10KSB 15831 11617 23Aug2006
8504 10KSB 14250 11617 23Aug2006
8504 10KSB 12775 11617 23Aug2006
8504 10KSB 12142 11617 23Aug2006
8504 10KSB 15831 10515 05Sep2006
8504 10KSB 14250 10515 05Sep2006
8504 10KSB 12775 10515 05Sep2006
8504 10KSB 12142 10515 05Sep2006
8504 10KSB 15831 10105 07Sep2006
8504 10KSB 14250 10105 07Sep2006
8504 10KSB 12775 10105 07Sep2006
8504 10KSB 12142 10105 07Sep2006
8504 10KSB 8970 7027 30Jun2000
8328 10-K 60811 49348 28Jul2009
8328 10-K 39154 49348 28Jul2009
8328 10-K 48477 37048 27Mar2015
8328 10-K 43947 32737 14Mar2014
8328 10-K 60811 30424 15Jul2009
8328 10-K 39154 30424 15Jul2009
;
data temp;
set have;
by var1 var4 var5 notsorted ;
if first.var1 then grp=0;
if first.var4 and first.var5 and not last.var4 and not last.var5 then grp+1;
run;
data want;
do until(last.var4);
do _n_=1 by 1 until(last.var5);
set temp;
by var1 var4 var5 notsorted;
if _n_=grp then output;
else if _n_=1 and last.var5 then output;
end;
end;
drop grp;
run;
Dear All,
Please help me with this urgent issue. I very much rely on your assistance.
Kind regards,
Ruslan
I think that the following does what you want:
proc sort data=have;
by var1 var2 var4 var5 descending var3;
run;
proc sort data=have out=want nodupkey;
by var1 var2 var4 var5;
run;
Art, CEO, AnalystFinder.com
No, unfortunately, it does not. If I use your code, I would get this for my first case:
var1 var2 var3 var4 var5
12040 10-K 33746 26098 29Dec2010
12040 10-K 33746 25206 04Jan2010
But I need to get this:
var1 var2 var3 var4 var5
12040 10-K 33746 26098 29Dec2010
12040 10-K 32298 25206 04Jan2010
Please please help me, guys. Look forward to your answers.
Kind regards,
Ruslan
I hope i understood your req:
data have;
input (var1 var2 var3 var4 var5) (:$10.);
cards;
12040 10-K 33746 26098 29Dec2010
12040 10-K 32298 26098 29Dec2010
12040 10-K 33746 25206 04Jan2010
12040 10-K 32298 25206 04Jan2010
8504 10KSB 15831 11617 23Aug2006
8504 10KSB 14250 11617 23Aug2006
8504 10KSB 12775 11617 23Aug2006
8504 10KSB 12142 11617 23Aug2006
8504 10KSB 15831 10515 05Sep2006
8504 10KSB 14250 10515 05Sep2006
8504 10KSB 12775 10515 05Sep2006
8504 10KSB 12142 10515 05Sep2006
8504 10KSB 15831 10105 07Sep2006
8504 10KSB 14250 10105 07Sep2006
8504 10KSB 12775 10105 07Sep2006
8504 10KSB 12142 10105 07Sep2006
8328 10-K 60811 49348 28Jul2009
8328 10-K 39154 49348 28Jul2009
8328 10-K 60811 30424 15Jul2009
8328 10-K 39154 30424 15Jul2009
;
data temp;
set have;
by var1 var4 var5 notsorted ;
if first.var1 then grp=0;
if first.var4 and first.var5 then grp+1;
run;
data want;
do until(last.var4);
do _n_=1 by 1 until(last.var5);
set temp;
by var1 var4 var5 notsorted;
if _n_=grp then output;
end;
end;
drop grp;
run;
This ?
data have;
input case $ var1 var2 $ var3 var4 var5 :date9.;
format var5 date9.;
datalines;
case_1 12040 10-K 33746 26098 29Dec2010
case_1 12040 10-K 32298 26098 29Dec2010
case_1 12040 10-K 33746 25206 04Jan2010
case_1 12040 10-K 32298 25206 04Jan2010
case_2 8504 10KSB 15831 11617 23Aug2006
case_2 8504 10KSB 14250 11617 23Aug2006
case_2 8504 10KSB 12775 11617 23Aug2006
case_2 8504 10KSB 12142 11617 23Aug2006
case_2 8504 10KSB 15831 10515 05Sep2006
case_2 8504 10KSB 14250 10515 05Sep2006
case_2 8504 10KSB 12775 10515 05Sep2006
case_2 8504 10KSB 12142 10515 05Sep2006
case_2 8504 10KSB 15831 10105 07Sep2006
case_2 8504 10KSB 14250 10105 07Sep2006
case_2 8504 10KSB 12775 10105 07Sep2006
case_2 8504 10KSB 12142 10105 07Sep2006
case_3 8328 10-K 60811 49348 28Jul2009
case_3 8328 10-K 39154 49348 28Jul2009
case_3 8328 10-K 60811 30424 15Jul2009
case_3 8328 10-K 39154 30424 15Jul2009
;
data want;
set have; by case;
if var3 ne lag(var3) and var4 ne lag(var4) then do;
gr + 1;
row = 0;
end;
if first.case then do;
gr = 1;
row = 0;
end;
row + 1;
if gr = row then output;
drop gr row;
run;
proc print data=want noobs; by case; id case; run;
Hi All,
Many thanks for your replies.
@PGStats and @novinosrin, your codes work perfectly, however I have both good cases and bad cases combined together. So, how can I extract only bad cases (create var case as @PGStats did) like those I have mentioned in my message? So my full data look like this:
12040 10-K 56376 47611 28Dec2007
12040 10-K 33746 26098 29Dec2010
12040 10-K 32298 26098 29Dec2010
12040 10-K 33247 25722 27Dec2013
12040 10-K 33746 25206 04Jan2010
12040 10-K 32298 25206 04Jan2010
8504 10KSB 20618 16895 16Jul2001
8504 10KSB 15831 11617 23Aug2006
8504 10KSB 14250 11617 23Aug2006
8504 10KSB 12775 11617 23Aug2006
8504 10KSB 12142 11617 23Aug2006
8504 10KSB 15831 10515 05Sep2006
8504 10KSB 14250 10515 05Sep2006
8504 10KSB 12775 10515 05Sep2006
8504 10KSB 12142 10515 05Sep2006
8504 10KSB 15831 10105 07Sep2006
8504 10KSB 14250 10105 07Sep2006
8504 10KSB 12775 10105 07Sep2006
8504 10KSB 12142 10105 07Sep2006
8504 10KSB 8970 7027 30Jun2000
8328 10-K 60811 49348 28Jul2009
8328 10-K 39154 49348 28Jul2009
8328 10-K 48477 37048 27Mar2015
8328 10-K 43947 32737 14Mar2014
8328 10-K 60811 30424 15Jul2009
8328 10-K 39154 30424 15Jul2009
I need to get this:
12040 10-K 56376 47611 28Dec2007
12040 10-K 33746 26098 29Dec2010
12040 10-K 33247 25722 27Dec2013
12040 10-K 32298 25206 04Jan2010
8504 10KSB 20618 16895 16Jul2001
8504 10KSB 15831 11617 23Aug2006
8504 10KSB 14250 10515 05Sep2006
8504 10KSB 12775 10105 07Sep2006
8504 10KSB 8970 7027 30Jun2000
8328 10-K 60811 49348 28Jul2009
8328 10-K 48477 37048 27Mar2015
8328 10-K 43947 32737 14Mar2014
8328 10-K 39154 30424 15Jul2009
I very much look forward to getting your reply. Your help and advice would be highly appreciated.
Kind regards,
Ruslan
@Ruslan Good morning, see if the modified version with your modified sample below works. I haven't had a coffee yet and so please test thoroughly. I will get my head around it in a bit after some caffeine.
data have;
input (var1 var2 var3 var4 var5) (:$10.);
cards;
12040 10-K 56376 47611 28Dec2007
12040 10-K 33746 26098 29Dec2010
12040 10-K 32298 26098 29Dec2010
12040 10-K 33247 25722 27Dec2013
12040 10-K 33746 25206 04Jan2010
12040 10-K 32298 25206 04Jan2010
8504 10KSB 20618 16895 16Jul2001
8504 10KSB 15831 11617 23Aug2006
8504 10KSB 14250 11617 23Aug2006
8504 10KSB 12775 11617 23Aug2006
8504 10KSB 12142 11617 23Aug2006
8504 10KSB 15831 10515 05Sep2006
8504 10KSB 14250 10515 05Sep2006
8504 10KSB 12775 10515 05Sep2006
8504 10KSB 12142 10515 05Sep2006
8504 10KSB 15831 10105 07Sep2006
8504 10KSB 14250 10105 07Sep2006
8504 10KSB 12775 10105 07Sep2006
8504 10KSB 12142 10105 07Sep2006
8504 10KSB 8970 7027 30Jun2000
8328 10-K 60811 49348 28Jul2009
8328 10-K 39154 49348 28Jul2009
8328 10-K 48477 37048 27Mar2015
8328 10-K 43947 32737 14Mar2014
8328 10-K 60811 30424 15Jul2009
8328 10-K 39154 30424 15Jul2009
;
data temp;
set have;
by var1 var4 var5 notsorted ;
if first.var1 then grp=0;
if first.var4 and first.var5 and not last.var4 and not last.var5 then grp+1;
run;
data want;
do until(last.var4);
do _n_=1 by 1 until(last.var5);
set temp;
by var1 var4 var5 notsorted;
if _n_=grp then output;
else if _n_=1 and last.var5 then output;
end;
end;
drop grp;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.