BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hx
Calcite | Level 5 hx
Calcite | Level 5

Hi, I have a dataset as follows, the variable 'w' has already been sorted, and I want to group them according to the values of 'value' and generate a new variable 'nw', and 'nw' should be given as the last w whose value is less than or equal to 200, and for those values large than 200, stay the same. It's really hard to describe here.

wvalue
abate200
abbreviations50
abdomen20
abdominal110
abelcet100
aberrant100
abiraterone100
abl1100
ablation210
abnormal30
abo780
abscission260
absence100
absenteeism140
absorption130
abstinence150
abstracting110
academia210
academic30
academy600
acceleration180
access130
accessibility100
accident200
accidents230
accountability50
accounting100
accreditation230
acculturation30
ace600
acetates920
acetazolamide310
acetic110
acetone210
acetyl50
acetylaminofluorene200
acetylatio230
acetylation200
bupropion550
burden560
burkitt's230
burn300
burns140
businesses550
butanoic800
butterflies500
butyrates100
butyric450
butyryl800
bypass330
bystander300
c300
c2130
c2h2150
c3h250
c57bl720


That is to say, I want to have something like:

wvaluenw
abate200abl1
abbreviations50abl1
abdomen20abl1
abdominal110abl1
abelcet100abl1
aberrant100abl1
abiraterone100abl1
abl1100abl1
ablation210ablation
abnormal30abnormal
abo780abo
abscission260abscission
absence100abstracting
absenteeism140abstracting
absorption130abstracting
abstinence150abstracting
abstracting110abstracting
academia210academia
academic30academic
academy600academy
acceleration180accident
access130accident
accessibility100accident
accident200accident
accidents230accidents
accountability50accounting
accounting100accounting
accreditation230accreditation
acculturation30acculturation
ace600ace
acetates920acetates
acetazolamide310acetazolamide
acetic110acetic
acetone210acetone
acetyl50acetylaminofluorene
acetylaminofluorene200acetylaminofluorene
acetylatio230acetylatio
acetylation200acetylation
bupropion550bupropion
burden560burden
burkitt's230burkitt's
burn300burn
burns140burns
businesses550businesses
butanoic800butanoic
butterflies500butterflies
butyrates100butyrates
butyric450butyric
butyryl800butyryl
bypass330bypass
bystander300bystander
c300c
c2130c2h2
c2h2150c2h2
c3h250c3h
c57bl720c57bl

All the things that came into my mind is to use sort but I don't think it worked. It seems like I cannot sort by the 'values' since I have sorted the data by'w', can someone help me out? Really appreciated.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

This should get you close, I haven't fully verified it works though.

PS - in the future please post your same data as a data step - like the first step in my proc. 

 

data have;
informat w $30. value 8.;
input w	$ value;
cards;
abate   200
abbreviations   50
abdomen 20
abdominal   110
abelcet 100
aberrant    100
abiraterone 100
abl1    100
ablation    210
abnormal    30
abo 780
abscission  260
absence 100
absenteeism 140
absorption  130
abstinence  150
abstracting 110
academia    210
academic    30
academy 600
acceleration    180
access  130
accessibility   100
accident    200
accidents   230
accountability  50
accounting  100
accreditation   230
acculturation   30
ace 600
acetates    920
acetazolamide   310
acetic  110
acetone 210
acetyl  50
acetylaminofluorene 200
acetylatio  230
acetylation 200
bupropion   550
burden  560
burkitt's   230
burn    300
burns   140
businesses  550
butanoic    800
butterflies 500
butyrates   100
butyric 450
butyryl 800
bypass  330
bystander   300
c   300
c2  130
c2h2    150
c3h 250
c57bl   720
;
run;

data add_order;
set have;
order=_n_;
run;

proc sort data=add_order;
by descending order;
run;

data want;
set add_order;
retain prev_group group;
if value>=200 then prev_group=1;
if prev_group=1 and value<200 then do;
	group = w;
	prev_group=0;
end;
else if prev_group=1 and value>=200 then group=w;

run;

proc sort data=want;
by order;
run;

View solution in original post

22 REPLIES 22
mkeintz
PROC Star

Why does w=accident/value=200 get nw=accident,    yet   w=abate/value=200 get nw=abl1?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hx
Calcite | Level 5 hx
Calcite | Level 5
Thank you for your reply, mkeintz!
w=abate/value=200 get nw=abl1 because I wanted to get the nw="the last w of the values that are <=200", for instance w=accident/value=200 get nw=accident because accidents/value=230 which is larger than 200. Does that make sense? sorry I didn't describe it clearly.
Reeza
Super User

Can you reverse the order of your data? So you're not looking forward but looking backward and reversing your logic. Its much easier and cleaner to lag in SAS. 

 

 

hx
Calcite | Level 5 hx
Calcite | Level 5
Do you mean reverse them by w alphabetically? But I still want to get the new variable based on w/values, how does that work by reversing?
Reeza
Super User

I don't know how you created your order in the first place.

 

But if you reverse your entire data you can reverse your logic. 

 

So it becomes if the value is over 200 then assign the value from column 1. Use retain to keep the value from row to row.

data want;
set have;
retain group;
if X > 200 then group=column1;
run;

 

Reeza
Super User

Nevermind, I didn't understand your logic. You're handling the values greater than 200 differently than I thought.

hx
Calcite | Level 5 hx
Calcite | Level 5
Yeah, but your code did give me some thinking, by using the group did give me something like what I want, yet still can't. Thank you anyway, I'm still trying
Reeza
Super User

You'd have to set a flag and then check the value of the flag so you need two retained variables.

 

 

hx
Calcite | Level 5 hx
Calcite | Level 5
How can I set the value of the flag? Should I use dummies or 'w'?
Reeza
Super User

This should get you close, I haven't fully verified it works though.

PS - in the future please post your same data as a data step - like the first step in my proc. 

 

data have;
informat w $30. value 8.;
input w	$ value;
cards;
abate   200
abbreviations   50
abdomen 20
abdominal   110
abelcet 100
aberrant    100
abiraterone 100
abl1    100
ablation    210
abnormal    30
abo 780
abscission  260
absence 100
absenteeism 140
absorption  130
abstinence  150
abstracting 110
academia    210
academic    30
academy 600
acceleration    180
access  130
accessibility   100
accident    200
accidents   230
accountability  50
accounting  100
accreditation   230
acculturation   30
ace 600
acetates    920
acetazolamide   310
acetic  110
acetone 210
acetyl  50
acetylaminofluorene 200
acetylatio  230
acetylation 200
bupropion   550
burden  560
burkitt's   230
burn    300
burns   140
businesses  550
butanoic    800
butterflies 500
butyrates   100
butyric 450
butyryl 800
bypass  330
bystander   300
c   300
c2  130
c2h2    150
c3h 250
c57bl   720
;
run;

data add_order;
set have;
order=_n_;
run;

proc sort data=add_order;
by descending order;
run;

data want;
set add_order;
retain prev_group group;
if value>=200 then prev_group=1;
if prev_group=1 and value<200 then do;
	group = w;
	prev_group=0;
end;
else if prev_group=1 and value>=200 then group=w;

run;

proc sort data=want;
by order;
run;
hx
Calcite | Level 5 hx
Calcite | Level 5
Thank you so much Reeza! it did work! I got to look deep into your code to get better understanding.
Astounding
PROC Star

This is untested, but should be at least most of the way there:

 

data want;

do until (done1 or value > 200);

   set have end=done1;

   if value <= 200 then nw = w;

end;

do until (done2 or value > 200);

   set have end=done2;

   if value > 200 then nw = w;

   output;

end;

run;

 

 

hx
Calcite | Level 5 hx
Calcite | Level 5
I tried but it seemed that new variable nw only has one value which is the last w in the set.
Astounding
PROC Star

Sounds like something is a little off.  While I can't see your log, try checking "done1" and "done2" to make sure they appear in the proper places.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 1402 views
  • 1 like
  • 5 in conversation