Hi, I have a dataset as follows, the variable 'w' has already been sorted, and I want to group them according to the values of 'value' and generate a new variable 'nw', and 'nw' should be given as the last w whose value is less than or equal to 200, and for those values large than 200, stay the same. It's really hard to describe here.
w | value |
abate | 200 |
abbreviations | 50 |
abdomen | 20 |
abdominal | 110 |
abelcet | 100 |
aberrant | 100 |
abiraterone | 100 |
abl1 | 100 |
ablation | 210 |
abnormal | 30 |
abo | 780 |
abscission | 260 |
absence | 100 |
absenteeism | 140 |
absorption | 130 |
abstinence | 150 |
abstracting | 110 |
academia | 210 |
academic | 30 |
academy | 600 |
acceleration | 180 |
access | 130 |
accessibility | 100 |
accident | 200 |
accidents | 230 |
accountability | 50 |
accounting | 100 |
accreditation | 230 |
acculturation | 30 |
ace | 600 |
acetates | 920 |
acetazolamide | 310 |
acetic | 110 |
acetone | 210 |
acetyl | 50 |
acetylaminofluorene | 200 |
acetylatio | 230 |
acetylation | 200 |
bupropion | 550 |
burden | 560 |
burkitt's | 230 |
burn | 300 |
burns | 140 |
businesses | 550 |
butanoic | 800 |
butterflies | 500 |
butyrates | 100 |
butyric | 450 |
butyryl | 800 |
bypass | 330 |
bystander | 300 |
c | 300 |
c2 | 130 |
c2h2 | 150 |
c3h | 250 |
c57bl | 720 |
That is to say, I want to have something like:
w | value | nw |
abate | 200 | abl1 |
abbreviations | 50 | abl1 |
abdomen | 20 | abl1 |
abdominal | 110 | abl1 |
abelcet | 100 | abl1 |
aberrant | 100 | abl1 |
abiraterone | 100 | abl1 |
abl1 | 100 | abl1 |
ablation | 210 | ablation |
abnormal | 30 | abnormal |
abo | 780 | abo |
abscission | 260 | abscission |
absence | 100 | abstracting |
absenteeism | 140 | abstracting |
absorption | 130 | abstracting |
abstinence | 150 | abstracting |
abstracting | 110 | abstracting |
academia | 210 | academia |
academic | 30 | academic |
academy | 600 | academy |
acceleration | 180 | accident |
access | 130 | accident |
accessibility | 100 | accident |
accident | 200 | accident |
accidents | 230 | accidents |
accountability | 50 | accounting |
accounting | 100 | accounting |
accreditation | 230 | accreditation |
acculturation | 30 | acculturation |
ace | 600 | ace |
acetates | 920 | acetates |
acetazolamide | 310 | acetazolamide |
acetic | 110 | acetic |
acetone | 210 | acetone |
acetyl | 50 | acetylaminofluorene |
acetylaminofluorene | 200 | acetylaminofluorene |
acetylatio | 230 | acetylatio |
acetylation | 200 | acetylation |
bupropion | 550 | bupropion |
burden | 560 | burden |
burkitt's | 230 | burkitt's |
burn | 300 | burn |
burns | 140 | burns |
businesses | 550 | businesses |
butanoic | 800 | butanoic |
butterflies | 500 | butterflies |
butyrates | 100 | butyrates |
butyric | 450 | butyric |
butyryl | 800 | butyryl |
bypass | 330 | bypass |
bystander | 300 | bystander |
c | 300 | c |
c2 | 130 | c2h2 |
c2h2 | 150 | c2h2 |
c3h | 250 | c3h |
c57bl | 720 | c57bl |
All the things that came into my mind is to use sort but I don't think it worked. It seems like I cannot sort by the 'values' since I have sorted the data by'w', can someone help me out? Really appreciated.
This should get you close, I haven't fully verified it works though.
PS - in the future please post your same data as a data step - like the first step in my proc.
data have;
informat w $30. value 8.;
input w $ value;
cards;
abate 200
abbreviations 50
abdomen 20
abdominal 110
abelcet 100
aberrant 100
abiraterone 100
abl1 100
ablation 210
abnormal 30
abo 780
abscission 260
absence 100
absenteeism 140
absorption 130
abstinence 150
abstracting 110
academia 210
academic 30
academy 600
acceleration 180
access 130
accessibility 100
accident 200
accidents 230
accountability 50
accounting 100
accreditation 230
acculturation 30
ace 600
acetates 920
acetazolamide 310
acetic 110
acetone 210
acetyl 50
acetylaminofluorene 200
acetylatio 230
acetylation 200
bupropion 550
burden 560
burkitt's 230
burn 300
burns 140
businesses 550
butanoic 800
butterflies 500
butyrates 100
butyric 450
butyryl 800
bypass 330
bystander 300
c 300
c2 130
c2h2 150
c3h 250
c57bl 720
;
run;
data add_order;
set have;
order=_n_;
run;
proc sort data=add_order;
by descending order;
run;
data want;
set add_order;
retain prev_group group;
if value>=200 then prev_group=1;
if prev_group=1 and value<200 then do;
group = w;
prev_group=0;
end;
else if prev_group=1 and value>=200 then group=w;
run;
proc sort data=want;
by order;
run;
Why does w=accident/value=200 get nw=accident, yet w=abate/value=200 get nw=abl1?
Can you reverse the order of your data? So you're not looking forward but looking backward and reversing your logic. Its much easier and cleaner to lag in SAS.
I don't know how you created your order in the first place.
But if you reverse your entire data you can reverse your logic.
So it becomes if the value is over 200 then assign the value from column 1. Use retain to keep the value from row to row.
data want;
set have;
retain group;
if X > 200 then group=column1;
run;
Nevermind, I didn't understand your logic. You're handling the values greater than 200 differently than I thought.
You'd have to set a flag and then check the value of the flag so you need two retained variables.
This should get you close, I haven't fully verified it works though.
PS - in the future please post your same data as a data step - like the first step in my proc.
data have;
informat w $30. value 8.;
input w $ value;
cards;
abate 200
abbreviations 50
abdomen 20
abdominal 110
abelcet 100
aberrant 100
abiraterone 100
abl1 100
ablation 210
abnormal 30
abo 780
abscission 260
absence 100
absenteeism 140
absorption 130
abstinence 150
abstracting 110
academia 210
academic 30
academy 600
acceleration 180
access 130
accessibility 100
accident 200
accidents 230
accountability 50
accounting 100
accreditation 230
acculturation 30
ace 600
acetates 920
acetazolamide 310
acetic 110
acetone 210
acetyl 50
acetylaminofluorene 200
acetylatio 230
acetylation 200
bupropion 550
burden 560
burkitt's 230
burn 300
burns 140
businesses 550
butanoic 800
butterflies 500
butyrates 100
butyric 450
butyryl 800
bypass 330
bystander 300
c 300
c2 130
c2h2 150
c3h 250
c57bl 720
;
run;
data add_order;
set have;
order=_n_;
run;
proc sort data=add_order;
by descending order;
run;
data want;
set add_order;
retain prev_group group;
if value>=200 then prev_group=1;
if prev_group=1 and value<200 then do;
group = w;
prev_group=0;
end;
else if prev_group=1 and value>=200 then group=w;
run;
proc sort data=want;
by order;
run;
This is untested, but should be at least most of the way there:
data want;
do until (done1 or value > 200);
set have end=done1;
if value <= 200 then nw = w;
end;
do until (done2 or value > 200);
set have end=done2;
if value > 200 then nw = w;
output;
end;
run;
Sounds like something is a little off. While I can't see your log, try checking "done1" and "done2" to make sure they appear in the proper places.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.