BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi Folks:

 

I'm creating a 'group' variable based on the specific values that 'idname' character variable takes. However, my simple if then statement is not working. It picks up almost randomly. Look how group=1 include city of Buk that i did not specify and the rest groups are even more inaccurate. 

What am i doing wrong here? 

 

Any pointers appreciated.  

unspecified.png


DATA all (keep =idname group group1); SET m.alldata_id; 
retain group;
if IDNAME IN ('Bundang' 'Jungwon' 'Sujeong') THEN GROUP=1; ELSE  
if IDNAME IN ('Gwonseon' 'Jangan' 'Paldal' 'Yeongtong') THEN GROUP=2; ELSE 
if IDNAME IN ('Cheoin' 'Yongin' 'Cheoin' 'Giheung' 'Suji' 'Suyon') THEN GROUP=3; ELSE
if IDNAME IN ('Heungdeok' 'Cheongju' 'Heungdeok' 'Seowon' 'Sangdang' 'Cheongwon') THEN GROUP=4; ELSE 
if IDNAME IN ('Changwon' 'Uichang' 'Seongsan' 'Masanhappo' 'Masanhoewon' 'Jinhae') THEN GROUP=5; ELSE 
if IDNAME IN ('Deogyang' 'Ilsandong' 'Ilsanseogu') THEN GROUP=6;
else group+1;
run; 
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Try:

 

DATA all (keep =idname group group1); SET m.alldata_id; 
retain group;
if IDNAME IN ('Bundang' 'Jungwon' 'Sujeong') THEN GROUP=1; ELSE  
if IDNAME IN ('Gwonseon' 'Jangan' 'Paldal' 'Yeongtong') THEN GROUP=2; ELSE 
if IDNAME IN ('Cheoin' 'Yongin' 'Cheoin' 'Giheung' 'Suji' 'Suyon') THEN GROUP=3; ELSE
if IDNAME IN ('Heungdeok' 'Cheongju' 'Heungdeok' 'Seowon' 'Sangdang' 'Cheongwon') THEN GROUP=4; ELSE 
if IDNAME IN ('Changwon' 'Uichang' 'Seongsan' 'Masanhappo' 'Masanhoewon' 'Jinhae') THEN GROUP=5; ELSE 
if IDNAME IN ('Deogyang' 'Ilsandong' 'Ilsanseogu') THEN GROUP=6;
else group=7 ;
run; 

View solution in original post

9 REPLIES 9
ed_sas_member
Meteorite | Level 14

Hi @Cruise 

It is due to the following statement and the fact that you RETAIN the value of the group variable

else group+1;

 -> at the first iteration, IDNAME = Buk -> as it isn't listed in the IF conditions, the GROUP+1 statement apply so GROUP is set to 1.

It is the same for "Andong" -> the. previous value was 2 (Yeongtong), and as it isn't listed in the IF conditions, the GROUP+1 statement apply so GROUP is set to 2+1 = 3

Best,

ballardw
Super User

First record : Group+1 in the else => the value will be one for any "other" name.

Similar, EVERY time you encounter an unlisted value the number gets incremented from the last assigned value. So if one record assigned 3 to group and the next record has a not listed Idname value the result will be 4 because your else increments the LAST retained value.

 

You might describe the actual rule that you are attempting to implement. I suspect that you may have meant to assign the value 7 to any of the not listed values, is that actually the case? If you want each of the IDNAME values that does not appear on the list to have a different value then you need to reconsider this approach as a pseudo-random assignment like this would be dependent on the order the values were encountered. Which means that the group could change the next time the code was run on similar data.

If the intent was to assign 7 to all of the others then you should use a last "group =7". Otherwise provide example in the form of a data step and the expected output for that data.

 

Again: Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the <> icon or attached as text to show exactly what you have and that we can test code against.

 

PICTURES are hard to code from an I am not even going to try.

 

One strongly suspects that you sorted the data prior to display and not in the order the code execute.

 

BTW, you are duplicating some of the names in your "list" values. Inefficient at best, at worst what ever you do like that could end up with the same Idname in two different group values.

Cruise
Ammonite | Level 13

@ballardw 

Thank you. 

I'm trying to implement the code support kindly provided to me today. The approach in the forum solved uses the retain and group+1 steps . Could you please take a look at? the link is below. 

https://communities.sas.com/t5/SAS-Programming/Fill-missing-in-specific-rows-by-groups/m-p/645822#M1...

ballardw
Super User

@Cruise wrote:

@ballardw 

Thank you. 

I'm trying to implement the code support kindly provided to me today. The approach in the forum solved uses the retain and group+1 steps . Could you please take a look at? the link is below. 

https://communities.sas.com/t5/SAS-Programming/Fill-missing-in-specific-rows-by-groups/m-p/645822#M1...


And as was referenced in that thread a couple of times:

A specific rule (or set of rules) needs to be explicitly stated. Which was not provided.

Do you intend each not listed IDname to get a different value? The same value? And should the same group value be assigned the next time that Id value is encountered in a raw data set or are you willing to accept changing "group" values for a given Idname?

 

If the idea is to assign the same group value to everything except those listed an Invalue can be created in Proc format to do that:

proc format;
invalue grpno
'Bundang', 'Jungwon', 'Sujeong' =1   
'Gwonseon', 'Jangan', 'Paldal', 'Yeongtong' =2  
'Cheoin', 'Yongin',  'Giheung', 'Suji', 'Suyon' =3 
'Heungdeok', 'Cheongju', 'Seowon', 'Sangdang', 'Cheongwon' =4  
'Changwon', 'Uichang', 'Seongsan', 'Masanhappo', 'Masanhoewon', 'Jinhae' =5  
'Deogyang', 'Ilsandong', 'Ilsanseogu' =6
' ' = .
other = 7
;

DATA all (keep =idname group group1); 
   SET m.alldata_id; 
   group =input(idname,grpno.);
run;

If each ID value is supposed to get a separate "group" value then LIST them in Proc format. Or maybe something like this gets started to create a format. CAUTION: This is only to get started. Do not rerun this with other data as the IDnames could get assigned different groups with different data. If you need to re-use this group assignment code then you will need to either create and maintain proc format code to included added values or append data sets with new idname values and the NEW group numbers to the cntlin data set and then rerun the format.

data example;
   do id = 1 to 50;
      if id in (1, 5, 6, 13, 21) then group=1;
      else if id in (34, 26, 27, 40) then group=2;
      else if id in (15, 28, 31, 33, 42) then group=3;
      else group=.;
      output;
   end;
run;
/* not actually needed by something similar
   would be needed your IDNAME
   YOU Also ONLY WANT one record per IDNAME
*/
proc sort data=example nodupkey out=sorted;
   by id;
run;

data want;
   set sorted;
   /* the value 3 is the number 
      of groups previously assigned
      above so the new values will
      start at one larger
   */
   retain altgroup 3;
   if missing(group) then do;
      altgroup+1;
      group=altgroup;
   end;
   drop altgroup;
run;

/* once happy with the above*/
data control;
   set want;
   fmtname='Grpno';
   type='I';
   start=id;
   label=group;
run;


 proc format cntlin=control out=work.cntlout;
 run;

Optionally you could use the WANT data set to use an SQL join to bring in group values. BUT again if your next data has different values for IDNAME then there may be missing groups values for some names and a similar issue about changing group values

Cruise
Ammonite | Level 13

@ballardw 

Thank you. My apology for ambiguity. 

Yes, the idea is to assign the same group value to everything except those listed to be grouped by. The rest of the data have no missing issue. No intervention is needed for the rest of the values.

Kurt_Bremser
Super User

Hardcoding this is insanity. At least it is sure to drive you insane in the near future. Warn your colleagues to call the ambulance as soon as they see foam on your lips 😉

Create a dataset that holds city names and group IDs, like this:

data city_groups;
input city :$30. group;
datalines;
Bundang 1
Jungwon 1
Sujeong 1
;

Then join this (or use a hash object) with your dataset, and code an alarm mechanism like

if lookup.find() ne 0
then do;
  text = "ERR" !! "OR"; /* I do this so that the code itself does not cause EG to display a "failed" icon */
  put text;
end;

For a batch job, you will want to code an ABORT statement with an error code that signals what went wrong.

(Whenever EG finds the word ERROR in the log returned from a node, it will display the "little red cross")

Reeza
Super User
DATA all (keep =idname group group1); SET m.alldata_id; 
retain group;
if IDNAME IN ('Bundang' 'Jungwon' 'Sujeong') THEN GROUP=1; ELSE  
if IDNAME IN ('Gwonseon' 'Jangan' 'Paldal' 'Yeongtong') THEN GROUP=2; ELSE 
if IDNAME IN ('Cheoin' 'Yongin' 'Cheoin' 'Giheung' 'Suji' 'Suyon') THEN GROUP=3; ELSE
if IDNAME IN ('Heungdeok' 'Cheongju' 'Heungdeok' 'Seowon' 'Sangdang' 'Cheongwon') THEN GROUP=4; ELSE 
if IDNAME IN ('Changwon' 'Uichang' 'Seongsan' 'Masanhappo' 'Masanhoewon' 'Jinhae') THEN GROUP=5; ELSE 
if IDNAME IN ('Deogyang' 'Ilsandong' 'Ilsanseogu') THEN GROUP=6;
else group=-99;
run; 

Then filter out your -99 and fix those in a different process or update your IF/THEN logic. Or do a second pass to fill in the group numbers if you're relying on position to fill in your data.

Ksharp
Super User

Try:

 

DATA all (keep =idname group group1); SET m.alldata_id; 
retain group;
if IDNAME IN ('Bundang' 'Jungwon' 'Sujeong') THEN GROUP=1; ELSE  
if IDNAME IN ('Gwonseon' 'Jangan' 'Paldal' 'Yeongtong') THEN GROUP=2; ELSE 
if IDNAME IN ('Cheoin' 'Yongin' 'Cheoin' 'Giheung' 'Suji' 'Suyon') THEN GROUP=3; ELSE
if IDNAME IN ('Heungdeok' 'Cheongju' 'Heungdeok' 'Seowon' 'Sangdang' 'Cheongwon') THEN GROUP=4; ELSE 
if IDNAME IN ('Changwon' 'Uichang' 'Seongsan' 'Masanhappo' 'Masanhoewon' 'Jinhae') THEN GROUP=5; ELSE 
if IDNAME IN ('Deogyang' 'Ilsandong' 'Ilsanseogu') THEN GROUP=6;
else group=7 ;
run; 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 812 views
  • 7 likes
  • 6 in conversation