BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasecn
Quartz | Level 8

I have a data with ids and names. Within one ID, there may have more than one values in Name. My goal is to create another variable called new_name. New_name will have the same value of name if all names are the same within one id, e.g. id1 in the example; will have a value of "multi_name" if ther are more than one values in Name, e.g. id 2 and 3 in the example; will have the value of name that is not equal to "no_name" wihtin the ID if there are two values within ID (one of them is "no_name") e.g. id 4 in the example.

 

So id3 should have new_name=multi_name, but id4 should have new_name=g. The attached code can produce the sample data "have", and "want" is the one i am looking for. The proc sql is not finished, i don't know how to code it for id4.

 

Thanks for any help!

data have;
input id $ name $;
datalines;
1 a
1 a
2 b
2 c
3 e
3 f
3 no_name
4 g
4 no_name
;
run;

data want;
input id $1. name $8. new_name $12.;
datalines;
1 a       a
1 a       a
2 b       multi_name
2 c       multi_name
3 e       multi_name
3 f       multi_name
3 no_name multi_name
4 g       g
4 no_name g
;
run;

proc sql;
create table want_1 as
select *,
case 
when count(distinct name) = 1 then name
when count(distinct name) > 1 then "multi_name"
/* how to code new_name = g for id 4 ? */
end as new_name
from have
group by id
;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @sasecn  Here is another variant similar to Tom.  This actually nests IFC/IFN functions within a CASE WHEN expression taught by Guru Paul D @hashman  a while ago- 

 

data have;
input id $ name $;
datalines;
1 a
1 a
2 b
2 c
3 e
3 f
3 no_name
4 g
4 no_name
;
run;

proc sql;
 create table want as
 select *, 
 case 
  when count(distinct name)=2 and max(name='no_name') then max(ifc(name='no_name',' ',name))
  when count(distinct name)>1  then 'multi_name'
  else name 
 end as new_name length=12
 from have
 group by id;
quit;

proc print noobs;run;
id name new_name
1 a a
1 a a
2 b multi_name
2 c multi_name
3 no_name multi_name
3 f multi_name
3 e multi_name
4 no_name g
4 g g

 

 

 

 

 

 

View solution in original post

10 REPLIES 10
jimbarbour
Meteorite | Level 14

I'm not quite sure if I understand what you're trying to do.

 

Can you mock up what your final output should look like?  If you post what you want your results to look like, then maybe I can help out.

 

Jim

sasecn
Quartz | Level 8

The data "want" is the output I am looking for. It should be based on data "have" with a new created variable called new_name.

Thanks,

Tom
Super User Tom
Super User

Easy enough in PROC SQL since SAS will automatically remerge aggregate statistics like COUNT() and MIN() back onto all observations in a group.

 

First let's replace those "no_name" values with actual empty strings so that SAS will treat them as missing values.  Let's add a unique ID so it will be easier to compare the values.

data have;
  row+1;
  input id $ name $;
datalines;
1 a
1 a
2 b
2 c
3 e
3 f
3 .
4 g
4 .
;

data expect;
  row+1;
  input id $ name $ new_name :$12.;
datalines;
1 a       a
1 a       a
2 b       multi_name
2 c       multi_name
3 e       multi_name
3 f       multi_name
3 .       multi_name
4 g       g
4 .       g
;

Now just group by ID and depending on whether the number of non-missing values is more than 1 decide whether to use that value (note that min() or max() of one value is the same value).

proc sql ;
create table want as
  select a.*
       , case when (count(distinct name) > 1) then 'multi_name' 
         else min(name) end as new_name length=12
  from have a
  group by a.id 
  order by row
;
quit;

proc print data=want; run;
proc compare data=want compare=expect;
run;

Results:

SAS 9.4 on WINDOWS

Obs    row    id    name    new_name

 1      1     1      a      a
 2      2     1      a      a
 3      3     2      b      multi_name
 4      4     2      c      multi_name
 5      5     3      e      multi_name
 6      6     3      f      multi_name
 7      7     3             multi_name
 8      8     4      g      g
 9      9     4             g

SAS 9.4 on WINDOWS                                                                                     

The COMPARE Procedure
Comparison of WORK.WANT with WORK.EXPECT
(Method=EXACT)

Data Set Summary

Dataset               Created          Modified  NVar    NObs

WORK.WANT    25MAY21:15:06:21  25MAY21:15:06:21     4       9
WORK.EXPECT  25MAY21:15:06:21  25MAY21:15:06:21     4       9


Variables Summary

Number of Variables in Common: 4.


Observation Summary

Observation      Base  Compare

First Obs           1        1
Last  Obs           9        9

Number of Observations in Common: 9.
Total Number of Observations Read from WORK.WANT: 9.
Total Number of Observations Read from WORK.EXPECT: 9.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 9.

NOTE: No unequal values were found. All values compared are exactly equal.

 

novinosrin
Tourmaline | Level 20

Hi @sasecn  Here is another variant similar to Tom.  This actually nests IFC/IFN functions within a CASE WHEN expression taught by Guru Paul D @hashman  a while ago- 

 

data have;
input id $ name $;
datalines;
1 a
1 a
2 b
2 c
3 e
3 f
3 no_name
4 g
4 no_name
;
run;

proc sql;
 create table want as
 select *, 
 case 
  when count(distinct name)=2 and max(name='no_name') then max(ifc(name='no_name',' ',name))
  when count(distinct name)>1  then 'multi_name'
  else name 
 end as new_name length=12
 from have
 group by id;
quit;

proc print noobs;run;
id name new_name
1 a a
1 a a
2 b multi_name
2 c multi_name
3 no_name multi_name
3 f multi_name
3 e multi_name
4 no_name g
4 g g

 

 

 

 

 

 

sasecn
Quartz | Level 8

Thanks, it works. Can you explain the condition "and max(name='no_name')" ? I understand the first part. Does the whole condition mean: if there are two outcomes of Name, one is some value of name, the other one is 'no_name', then take the max value (ie. not the emapty one)?

novinosrin
Tourmaline | Level 20

Hi again @sasecn   the logic is fairly simple once we understand the "waterfall construct"-

 

1. Start with checking for 2 distinct values with one valid name and the other no_name viz. max(name='no_name') results in a boolean 1's or 0's true for no_name or false

2. 1st When fails, the 2nd when expression checks for any count of distinct names >1 and assigns the easy multiname

3. When 1 & 2 fails, there can only be one more options i.e. all names being same.

 

Hope that helps

sasecn
Quartz | Level 8

Thank you! Really helpful!

sasecn
Quartz | Level 8

Thanks, this code works well. I just don't understand the "else min(name)" part. For example, how it can take the value of "g" in id4 case? Does the SAS just ingore the row with missing value?

Tom
Super User Tom
Super User

MIN() of character strings ignores blank values the same as it ignores missing value for numeric variables.

COUNT() also ignores the blank values.

 

If your data really has "no_name" as the value of NAME instead of blanks then use CASE (or IFC()) to remove those.

count(distinct case when name='no_name' then ' ' else name end)
...
min(case when name='no_name' then ' ' else name end)
sasecn
Quartz | Level 8

Great, thanks for the answer1

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 7556 views
  • 4 likes
  • 4 in conversation