Hi Folks:
I'm creating a 'group' variable based on the specific values that 'idname' character variable takes. However, my simple if then statement is not working. It picks up almost randomly. Look how group=1 include city of Buk that i did not specify and the rest groups are even more inaccurate.
What am i doing wrong here?
Any pointers appreciated.
DATA all (keep =idname group group1); SET m.alldata_id;
retain group;
if IDNAME IN ('Bundang' 'Jungwon' 'Sujeong') THEN GROUP=1; ELSE
if IDNAME IN ('Gwonseon' 'Jangan' 'Paldal' 'Yeongtong') THEN GROUP=2; ELSE
if IDNAME IN ('Cheoin' 'Yongin' 'Cheoin' 'Giheung' 'Suji' 'Suyon') THEN GROUP=3; ELSE
if IDNAME IN ('Heungdeok' 'Cheongju' 'Heungdeok' 'Seowon' 'Sangdang' 'Cheongwon') THEN GROUP=4; ELSE
if IDNAME IN ('Changwon' 'Uichang' 'Seongsan' 'Masanhappo' 'Masanhoewon' 'Jinhae') THEN GROUP=5; ELSE
if IDNAME IN ('Deogyang' 'Ilsandong' 'Ilsanseogu') THEN GROUP=6;
else group+1;
run;
Try:
DATA all (keep =idname group group1); SET m.alldata_id; retain group; if IDNAME IN ('Bundang' 'Jungwon' 'Sujeong') THEN GROUP=1; ELSE if IDNAME IN ('Gwonseon' 'Jangan' 'Paldal' 'Yeongtong') THEN GROUP=2; ELSE if IDNAME IN ('Cheoin' 'Yongin' 'Cheoin' 'Giheung' 'Suji' 'Suyon') THEN GROUP=3; ELSE if IDNAME IN ('Heungdeok' 'Cheongju' 'Heungdeok' 'Seowon' 'Sangdang' 'Cheongwon') THEN GROUP=4; ELSE if IDNAME IN ('Changwon' 'Uichang' 'Seongsan' 'Masanhappo' 'Masanhoewon' 'Jinhae') THEN GROUP=5; ELSE if IDNAME IN ('Deogyang' 'Ilsandong' 'Ilsanseogu') THEN GROUP=6; else group=7 ; run;
Hi @Cruise
It is due to the following statement and the fact that you RETAIN the value of the group variable
else group+1;
-> at the first iteration, IDNAME = Buk -> as it isn't listed in the IF conditions, the GROUP+1 statement apply so GROUP is set to 1.
It is the same for "Andong" -> the. previous value was 2 (Yeongtong), and as it isn't listed in the IF conditions, the GROUP+1 statement apply so GROUP is set to 2+1 = 3
Best,
First record : Group+1 in the else => the value will be one for any "other" name.
Similar, EVERY time you encounter an unlisted value the number gets incremented from the last assigned value. So if one record assigned 3 to group and the next record has a not listed Idname value the result will be 4 because your else increments the LAST retained value.
You might describe the actual rule that you are attempting to implement. I suspect that you may have meant to assign the value 7 to any of the not listed values, is that actually the case? If you want each of the IDNAME values that does not appear on the list to have a different value then you need to reconsider this approach as a pseudo-random assignment like this would be dependent on the order the values were encountered. Which means that the group could change the next time the code was run on similar data.
If the intent was to assign 7 to all of the others then you should use a last "group =7". Otherwise provide example in the form of a data step and the expected output for that data.
Again: Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the <> icon or attached as text to show exactly what you have and that we can test code against.
PICTURES are hard to code from an I am not even going to try.
One strongly suspects that you sorted the data prior to display and not in the order the code execute.
BTW, you are duplicating some of the names in your "list" values. Inefficient at best, at worst what ever you do like that could end up with the same Idname in two different group values.
Thank you.
I'm trying to implement the code support kindly provided to me today. The approach in the forum solved uses the retain and group+1 steps . Could you please take a look at? the link is below.
As I said in the other thread, that code is only good for ONE specific situation. Trying to generalize it will only give you headaches.
@Cruise wrote:
Thank you.
I'm trying to implement the code support kindly provided to me today. The approach in the forum solved uses the retain and group+1 steps . Could you please take a look at? the link is below.
And as was referenced in that thread a couple of times:
A specific rule (or set of rules) needs to be explicitly stated. Which was not provided.
Do you intend each not listed IDname to get a different value? The same value? And should the same group value be assigned the next time that Id value is encountered in a raw data set or are you willing to accept changing "group" values for a given Idname?
If the idea is to assign the same group value to everything except those listed an Invalue can be created in Proc format to do that:
proc format; invalue grpno 'Bundang', 'Jungwon', 'Sujeong' =1 'Gwonseon', 'Jangan', 'Paldal', 'Yeongtong' =2 'Cheoin', 'Yongin', 'Giheung', 'Suji', 'Suyon' =3 'Heungdeok', 'Cheongju', 'Seowon', 'Sangdang', 'Cheongwon' =4 'Changwon', 'Uichang', 'Seongsan', 'Masanhappo', 'Masanhoewon', 'Jinhae' =5 'Deogyang', 'Ilsandong', 'Ilsanseogu' =6 ' ' = . other = 7 ; DATA all (keep =idname group group1); SET m.alldata_id; group =input(idname,grpno.); run;
If each ID value is supposed to get a separate "group" value then LIST them in Proc format. Or maybe something like this gets started to create a format. CAUTION: This is only to get started. Do not rerun this with other data as the IDnames could get assigned different groups with different data. If you need to re-use this group assignment code then you will need to either create and maintain proc format code to included added values or append data sets with new idname values and the NEW group numbers to the cntlin data set and then rerun the format.
data example; do id = 1 to 50; if id in (1, 5, 6, 13, 21) then group=1; else if id in (34, 26, 27, 40) then group=2; else if id in (15, 28, 31, 33, 42) then group=3; else group=.; output; end; run; /* not actually needed by something similar would be needed your IDNAME YOU Also ONLY WANT one record per IDNAME */ proc sort data=example nodupkey out=sorted; by id; run; data want; set sorted; /* the value 3 is the number of groups previously assigned above so the new values will start at one larger */ retain altgroup 3; if missing(group) then do; altgroup+1; group=altgroup; end; drop altgroup; run; /* once happy with the above*/ data control; set want; fmtname='Grpno'; type='I'; start=id; label=group; run; proc format cntlin=control out=work.cntlout; run;
Optionally you could use the WANT data set to use an SQL join to bring in group values. BUT again if your next data has different values for IDNAME then there may be missing groups values for some names and a similar issue about changing group values
Thank you. My apology for ambiguity.
Yes, the idea is to assign the same group value to everything except those listed to be grouped by. The rest of the data have no missing issue. No intervention is needed for the rest of the values.
Hardcoding this is insanity. At least it is sure to drive you insane in the near future. Warn your colleagues to call the ambulance as soon as they see foam on your lips 😉
Create a dataset that holds city names and group IDs, like this:
data city_groups;
input city :$30. group;
datalines;
Bundang 1
Jungwon 1
Sujeong 1
;
Then join this (or use a hash object) with your dataset, and code an alarm mechanism like
if lookup.find() ne 0
then do;
text = "ERR" !! "OR"; /* I do this so that the code itself does not cause EG to display a "failed" icon */
put text;
end;
For a batch job, you will want to code an ABORT statement with an error code that signals what went wrong.
(Whenever EG finds the word ERROR in the log returned from a node, it will display the "little red cross")
DATA all (keep =idname group group1); SET m.alldata_id;
retain group;
if IDNAME IN ('Bundang' 'Jungwon' 'Sujeong') THEN GROUP=1; ELSE
if IDNAME IN ('Gwonseon' 'Jangan' 'Paldal' 'Yeongtong') THEN GROUP=2; ELSE
if IDNAME IN ('Cheoin' 'Yongin' 'Cheoin' 'Giheung' 'Suji' 'Suyon') THEN GROUP=3; ELSE
if IDNAME IN ('Heungdeok' 'Cheongju' 'Heungdeok' 'Seowon' 'Sangdang' 'Cheongwon') THEN GROUP=4; ELSE
if IDNAME IN ('Changwon' 'Uichang' 'Seongsan' 'Masanhappo' 'Masanhoewon' 'Jinhae') THEN GROUP=5; ELSE
if IDNAME IN ('Deogyang' 'Ilsandong' 'Ilsanseogu') THEN GROUP=6;
else group=-99;
run;
Then filter out your -99 and fix those in a different process or update your IF/THEN logic. Or do a second pass to fill in the group numbers if you're relying on position to fill in your data.
Try:
DATA all (keep =idname group group1); SET m.alldata_id; retain group; if IDNAME IN ('Bundang' 'Jungwon' 'Sujeong') THEN GROUP=1; ELSE if IDNAME IN ('Gwonseon' 'Jangan' 'Paldal' 'Yeongtong') THEN GROUP=2; ELSE if IDNAME IN ('Cheoin' 'Yongin' 'Cheoin' 'Giheung' 'Suji' 'Suyon') THEN GROUP=3; ELSE if IDNAME IN ('Heungdeok' 'Cheongju' 'Heungdeok' 'Seowon' 'Sangdang' 'Cheongwon') THEN GROUP=4; ELSE if IDNAME IN ('Changwon' 'Uichang' 'Seongsan' 'Masanhappo' 'Masanhoewon' 'Jinhae') THEN GROUP=5; ELSE if IDNAME IN ('Deogyang' 'Ilsandong' 'Ilsanseogu') THEN GROUP=6; else group=7 ; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.