I have a data with ids and names. Within one ID, there may have more than one values in Name. My goal is to create another variable called new_name. New_name will have the same value of name if all names are the same within one id, e.g. id1 in the example; will have a value of "multi_name" if ther are more than one values in Name, e.g. id 2 and 3 in the example; will have the value of name that is not equal to "no_name" wihtin the ID if there are two values within ID (one of them is "no_name") e.g. id 4 in the example.
So id3 should have new_name=multi_name, but id4 should have new_name=g. The attached code can produce the sample data "have", and "want" is the one i am looking for. The proc sql is not finished, i don't know how to code it for id4.
Thanks for any help!
data have;
input id $ name $;
datalines;
1 a
1 a
2 b
2 c
3 e
3 f
3 no_name
4 g
4 no_name
;
run;
data want;
input id $1. name $8. new_name $12.;
datalines;
1 a a
1 a a
2 b multi_name
2 c multi_name
3 e multi_name
3 f multi_name
3 no_name multi_name
4 g g
4 no_name g
;
run;
proc sql;
create table want_1 as
select *,
case
when count(distinct name) = 1 then name
when count(distinct name) > 1 then "multi_name"
/* how to code new_name = g for id 4 ? */
end as new_name
from have
group by id
;
quit;
Hi @sasecn Here is another variant similar to Tom. This actually nests IFC/IFN functions within a CASE WHEN expression taught by Guru Paul D @hashman a while ago-
data have;
input id $ name $;
datalines;
1 a
1 a
2 b
2 c
3 e
3 f
3 no_name
4 g
4 no_name
;
run;
proc sql;
create table want as
select *,
case
when count(distinct name)=2 and max(name='no_name') then max(ifc(name='no_name',' ',name))
when count(distinct name)>1 then 'multi_name'
else name
end as new_name length=12
from have
group by id;
quit;
proc print noobs;run;
id | name | new_name |
---|---|---|
1 | a | a |
1 | a | a |
2 | b | multi_name |
2 | c | multi_name |
3 | no_name | multi_name |
3 | f | multi_name |
3 | e | multi_name |
4 | no_name | g |
4 | g | g |
I'm not quite sure if I understand what you're trying to do.
Can you mock up what your final output should look like? If you post what you want your results to look like, then maybe I can help out.
Jim
The data "want" is the output I am looking for. It should be based on data "have" with a new created variable called new_name.
Thanks,
Easy enough in PROC SQL since SAS will automatically remerge aggregate statistics like COUNT() and MIN() back onto all observations in a group.
First let's replace those "no_name" values with actual empty strings so that SAS will treat them as missing values. Let's add a unique ID so it will be easier to compare the values.
data have;
row+1;
input id $ name $;
datalines;
1 a
1 a
2 b
2 c
3 e
3 f
3 .
4 g
4 .
;
data expect;
row+1;
input id $ name $ new_name :$12.;
datalines;
1 a a
1 a a
2 b multi_name
2 c multi_name
3 e multi_name
3 f multi_name
3 . multi_name
4 g g
4 . g
;
Now just group by ID and depending on whether the number of non-missing values is more than 1 decide whether to use that value (note that min() or max() of one value is the same value).
proc sql ;
create table want as
select a.*
, case when (count(distinct name) > 1) then 'multi_name'
else min(name) end as new_name length=12
from have a
group by a.id
order by row
;
quit;
proc print data=want; run;
proc compare data=want compare=expect;
run;
Results:
SAS 9.4 on WINDOWS Obs row id name new_name 1 1 1 a a 2 2 1 a a 3 3 2 b multi_name 4 4 2 c multi_name 5 5 3 e multi_name 6 6 3 f multi_name 7 7 3 multi_name 8 8 4 g g 9 9 4 g SAS 9.4 on WINDOWS The COMPARE Procedure Comparison of WORK.WANT with WORK.EXPECT (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs WORK.WANT 25MAY21:15:06:21 25MAY21:15:06:21 4 9 WORK.EXPECT 25MAY21:15:06:21 25MAY21:15:06:21 4 9 Variables Summary Number of Variables in Common: 4. Observation Summary Observation Base Compare First Obs 1 1 Last Obs 9 9 Number of Observations in Common: 9. Total Number of Observations Read from WORK.WANT: 9. Total Number of Observations Read from WORK.EXPECT: 9. Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables Equal: 9. NOTE: No unequal values were found. All values compared are exactly equal.
Hi @sasecn Here is another variant similar to Tom. This actually nests IFC/IFN functions within a CASE WHEN expression taught by Guru Paul D @hashman a while ago-
data have;
input id $ name $;
datalines;
1 a
1 a
2 b
2 c
3 e
3 f
3 no_name
4 g
4 no_name
;
run;
proc sql;
create table want as
select *,
case
when count(distinct name)=2 and max(name='no_name') then max(ifc(name='no_name',' ',name))
when count(distinct name)>1 then 'multi_name'
else name
end as new_name length=12
from have
group by id;
quit;
proc print noobs;run;
id | name | new_name |
---|---|---|
1 | a | a |
1 | a | a |
2 | b | multi_name |
2 | c | multi_name |
3 | no_name | multi_name |
3 | f | multi_name |
3 | e | multi_name |
4 | no_name | g |
4 | g | g |
Thanks, it works. Can you explain the condition "and max(name='no_name')" ? I understand the first part. Does the whole condition mean: if there are two outcomes of Name, one is some value of name, the other one is 'no_name', then take the max value (ie. not the emapty one)?
Hi again @sasecn the logic is fairly simple once we understand the "waterfall construct"-
1. Start with checking for 2 distinct values with one valid name and the other no_name viz. max(name='no_name') results in a boolean 1's or 0's true for no_name or false
2. 1st When fails, the 2nd when expression checks for any count of distinct names >1 and assigns the easy multiname
3. When 1 & 2 fails, there can only be one more options i.e. all names being same.
Hope that helps
Thank you! Really helpful!
Thanks, this code works well. I just don't understand the "else min(name)" part. For example, how it can take the value of "g" in id4 case? Does the SAS just ingore the row with missing value?
MIN() of character strings ignores blank values the same as it ignores missing value for numeric variables.
COUNT() also ignores the blank values.
If your data really has "no_name" as the value of NAME instead of blanks then use CASE (or IFC()) to remove those.
count(distinct case when name='no_name' then ' ' else name end)
...
min(case when name='no_name' then ' ' else name end)
Great, thanks for the answer1
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.