Hi All,
I have the following table with people's contact information. for each by group I have their Home/Mobile/Email and Cellular no. I need to find the observation with contact_type = 'Home'. If the by group don't have a 'Home' contact type then the code will bring the contact_type = 'Mobile', if Mobile is missing then it will bring the contact_type = 'Cellular'.
ID Contact_Type Contact
101 Home 4434323454
101 Email abc@gmail.com
101 Mobile 8767876546
102 Mobile 8987609878
102 Email ghg@gmail.com
103 Cellular 6779998786
103 Email aaa@gmail.com
Output will look like this:
ID Contact_Type Contact
101 Home 4434323454
102 Mobile 8987609878
103 Cellular 6769998786
Can someone help me please. Thanks,
Hi @mlogan,
I can reassure you: "My" code (well, actually it's primarily @Reeza's with valuable input from @LinusH) works
even if there is no email address available in a particular BY group (but only some of the other three contact types).
Okay, so you don't want to see email addresses in your output dataset. This is no problem and you already took the first step in adapting the code correspondingly: You removed 'Email' from the arguments of WHICHC.
As a consequence, however, any existing observation with Contact_Type='Email' (or with any other value of Contact_Type not listed in the arguments of WHICHC, for that matter) will be assigned priority 0. This zero, in turn, will always end up as the minimum priority in the BY group and hence the observation will be selected (even more than one such observation, if any).
To prevent this, we can simply amend the WHERE clause to restrict the selection to observations with non-zero (and non-missing) priority:
where not missing(Contact) & calculated priority
With this small change you will get your expected output from the new input data you posted. To read the raw data correctly, we have to correct your suggested INPUT statement, though (I assume the "&" was a typo):
input ID (Contact_type Contact) (:$20.);
Please note that values of Contact_Type not listed in WHICHC(...) could still cause unexpected results, as described above. This includes values such as 'HOME', 'mobile', 'Cell' or ' Home' (the latter with leading blank).
To make the code more robust, we could modify the function call as follows:
whichc(put(left(Contact_Type),$upcase3.), 'HOM', 'MOB', 'CEL')
This would handle the abovementioned spellings of contact types as well as the ordinary ones.
Finally, it should be mentioned that an ID with two or more observations with "top priority" (e.g. two different 'Mobile' numbers and no 'Home') would occur with all these observations in the output dataset.
So basically you have a priority.
Code me the phone type to the priority level - you can use if/then, select, format or a merge/join.
Sort by id and priority and use first.id to obtain the highest priority record.
Untested but should get you started.
Proc format;
InValue $ phone_type_fmt
'Home' = 1
'Mobile' = 2
....
Rest of your definitions;
Proc sql;
Create table temp as
Select a.*, input(contact_type, $phone_type_fmt.) as priority
From have
Where not missing(phone_number)
Order by id, priority;
Quit;
Then use a data step with first.id to obtain the highest priority number.
Hi @mlogan,
To simplify the code further, you could use the WHICHC function rather than an informat to define variable PRIORITY:
proc sql;
create table want(drop=priority) as
select *, whichc(Contact_Type, 'Home', 'Mobile', 'Cellular', 'Email') as priority
from have
where not missing(Contact)
group by id
having priority=min(priority);
quit;
Thanks Reinhard, your code works, but it doesn't work when 101 by group don't have an email and 104 has only email (see the following below). I want Home, Mobile and Cellular as priority order. If all are missing, then I don't want that observation (104 in this case). Can you please help further. Thanks,
data have;
input ID (Contact_type Contact) ( & $20.);
cards;
101 Home 4434323454
101 Mobile 8767876546
102 Mobile 8987609878
102 Email ghg@gmail.com
103 Email aaa@gmail.com
103 Cellular 6779998786
104 Email sss@gmail.com
;
RUN
proc sql;
create table want(drop=priority) as
select *, whichc(Contact_Type, 'Home', 'Mobile', 'Cellular') as priority
from have
where not missing(Contact)
group by id
having priority=min(priority);
quit;
Folowing is my expected output:
101 Home 4434323454
102 Mobile 8987609878
103 Cellular 6779998786
Hi @mlogan,
I can reassure you: "My" code (well, actually it's primarily @Reeza's with valuable input from @LinusH) works
even if there is no email address available in a particular BY group (but only some of the other three contact types).
Okay, so you don't want to see email addresses in your output dataset. This is no problem and you already took the first step in adapting the code correspondingly: You removed 'Email' from the arguments of WHICHC.
As a consequence, however, any existing observation with Contact_Type='Email' (or with any other value of Contact_Type not listed in the arguments of WHICHC, for that matter) will be assigned priority 0. This zero, in turn, will always end up as the minimum priority in the BY group and hence the observation will be selected (even more than one such observation, if any).
To prevent this, we can simply amend the WHERE clause to restrict the selection to observations with non-zero (and non-missing) priority:
where not missing(Contact) & calculated priority
With this small change you will get your expected output from the new input data you posted. To read the raw data correctly, we have to correct your suggested INPUT statement, though (I assume the "&" was a typo):
input ID (Contact_type Contact) (:$20.);
Please note that values of Contact_Type not listed in WHICHC(...) could still cause unexpected results, as described above. This includes values such as 'HOME', 'mobile', 'Cell' or ' Home' (the latter with leading blank).
To make the code more robust, we could modify the function call as follows:
whichc(put(left(Contact_Type),$upcase3.), 'HOM', 'MOB', 'CEL')
This would handle the abovementioned spellings of contact types as well as the ordinary ones.
Finally, it should be mentioned that an ID with two or more observations with "top priority" (e.g. two different 'Mobile' numbers and no 'Home') would occur with all these observations in the output dataset.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.