Solved: Retrieving data within by group

mlogan · Posted 03-17-2016 10:44 PM

Hi All,

I have the following table with people's contact information. for each by group I have their Home/Mobile/Email and Cellular no. I need to find the observation with contact_type = 'Home'. If the by group don't have a 'Home' contact type then the code will bring the contact_type = 'Mobile', if Mobile is missing then it will bring the contact_type = 'Cellular'.

ID Contact_Type Contact

101 Home 4434323454

101 Email abc@gmail.com

101 Mobile 8767876546

102 Mobile 8987609878

102 Email ghg@gmail.com

103 Cellular 6779998786

103 Email aaa@gmail.com

Output will look like this:

ID Contact_Type Contact

101 Home 4434323454

102 Mobile 8987609878

103 Cellular 6769998786

Can someone help me please. Thanks,

FreelanceReinh · Posted 03-18-2016 04:10 PM

Hi @mlogan,

I can reassure you: "My" code (well, actually it's primarily @Reeza's with valuable input from @LinusH) works

even if there is no email address available in a particular BY group (but only some of the other three contact types).

Okay, so you don't want to see email addresses in your output dataset. This is no problem and you already took the first step in adapting the code correspondingly: You removed 'Email' from the arguments of WHICHC.

As a consequence, however, any existing observation with Contact_Type='Email' (or with any other value of Contact_Type not listed in the arguments of WHICHC, for that matter) will be assigned priority 0. This zero, in turn, will always end up as the minimum priority in the BY group and hence the observation will be selected (even more than one such observation, if any).

To prevent this, we can simply amend the WHERE clause to restrict the selection to observations with non-zero (and non-missing) priority:

where not missing(Contact) & calculated priority

With this small change you will get your expected output from the new input data you posted. To read the raw data correctly, we have to correct your suggested INPUT statement, though (I assume the "&" was a typo):

input ID (Contact_type Contact) (:$20.);

Please note that values of Contact_Type not listed in WHICHC(...) could still cause unexpected results, as described above. This includes values such as 'HOME', 'mobile', 'Cell' or ' Home' (the latter with leading blank).

To make the code more robust, we could modify the function call as follows:

whichc(put(left(Contact_Type),$upcase3.), 'HOM', 'MOB', 'CEL')

This would handle the abovementioned spellings of contact types as well as the ordinary ones.

Finally, it should be mentioned that an ID with two or more observations with "top priority" (e.g. two different 'Mobile' numbers and no 'Home') would occur with all these observations in the output dataset.

View solution in original post

Reeza · Posted 03-17-2016 11:29 PM

So basically you have a priority.

Code me the phone type to the priority level - you can use if/then, select, format or a merge/join.

Sort by id and priority and use first.id to obtain the highest priority record.

Untested but should get you started.

Proc format;
InValue $ phone_type_fmt
'Home' = 1
'Mobile' = 2
....
Rest of your definitions;

Proc sql;
Create table temp as
Select a.*, input(contact_type, $phone_type_fmt.) as priority
From have
Where not missing(phone_number)
Order by id, priority;
Quit;

Then use a data step with first.id to obtain the highest priority number.

mlogan · Posted 03-18-2016 01:09 PM

Hi Reeza, Would you please write the first.id for procedure for me. I want the output like:

101 Home 4434323454
102 Mobile 8987609878
103 Cellular 6779998786

Thanks,

data have;
input ID (Contact_type Contact) ( & $20.);
cards;

101 Home 4434323454
101 Mobile 8767876546
102 Mobile 8987609878
102 Email ghg@gmail.com
103 Email aaa@gmail.com
103 Cellular 6779998786
104 Email sss@gmail.com
;
RUN

Proc format;
InValue $ Contact_type_fmt
'Home' = 1
'Mobile' = 2
'Cellular' = 3
'Email'= 4;

Proc sql;
Create table temp as
Select have.*, input(contact_type, $contact_type_fmt.) as priority
From have
Where not missing(Contact)
Order by id, priority;
Quit;

LinusH · Posted 03-18-2016 01:23 AM

If you would like to do this in one step, just slightly modify @Reeza's SQL to include a group by, and a having priority = min(priority).

Data never sleeps

FreelanceReinh · Posted 03-18-2016 08:01 AM

Hi @mlogan,

To simplify the code further, you could use the WHICHC function rather than an informat to define variable PRIORITY:

proc sql;
create table want(drop=priority) as
select *, whichc(Contact_Type, 'Home', 'Mobile', 'Cellular', 'Email') as priority
from have
where not missing(Contact)
group by id
having priority=min(priority);
quit;

mlogan · Posted 03-18-2016 12:49 PM

Thanks Reinhard, your code works, but it doesn't work when 101 by group don't have an email and 104 has only email (see the following below). I want Home, Mobile and Cellular as priority order. If all are missing, then I don't want that observation (104 in this case). Can you please help further. Thanks,

data have;
input ID (Contact_type Contact) ( & $20.);
cards;

101 Home 4434323454
101 Mobile 8767876546
102 Mobile 8987609878
102 Email ghg@gmail.com
103 Email aaa@gmail.com
103 Cellular 6779998786
104 Email sss@gmail.com
;
RUN

proc sql;
create table want(drop=priority) as
select *, whichc(Contact_Type, 'Home', 'Mobile', 'Cellular') as priority
from have
where not missing(Contact)
group by id
having priority=min(priority);
quit;

Folowing is my expected output:

101 Home 4434323454
102 Mobile 8987609878
103 Cellular 6779998786

FreelanceReinh · Posted 03-18-2016 04:10 PM

Hi @mlogan,

I can reassure you: "My" code (well, actually it's primarily @Reeza's with valuable input from @LinusH) works

even if there is no email address available in a particular BY group (but only some of the other three contact types).

Okay, so you don't want to see email addresses in your output dataset. This is no problem and you already took the first step in adapting the code correspondingly: You removed 'Email' from the arguments of WHICHC.

As a consequence, however, any existing observation with Contact_Type='Email' (or with any other value of Contact_Type not listed in the arguments of WHICHC, for that matter) will be assigned priority 0. This zero, in turn, will always end up as the minimum priority in the BY group and hence the observation will be selected (even more than one such observation, if any).

To prevent this, we can simply amend the WHERE clause to restrict the selection to observations with non-zero (and non-missing) priority:

where not missing(Contact) & calculated priority

With this small change you will get your expected output from the new input data you posted. To read the raw data correctly, we have to correct your suggested INPUT statement, though (I assume the "&" was a typo):

input ID (Contact_type Contact) (:$20.);

Please note that values of Contact_Type not listed in WHICHC(...) could still cause unexpected results, as described above. This includes values such as 'HOME', 'mobile', 'Cell' or ' Home' (the latter with leading blank).

To make the code more robust, we could modify the function call as follows:

whichc(put(left(Contact_Type),$upcase3.), 'HOM', 'MOB', 'CEL')

This would handle the abovementioned spellings of contact types as well as the ordinary ones.

Finally, it should be mentioned that an ID with two or more observations with "top priority" (e.g. two different 'Mobile' numbers and no 'Home') would occur with all these observations in the output dataset.

mlogan · Posted 03-18-2016 10:11 PM

Hi Reinhard, Thanks for your help. it worked for me.
input ID (Contact_type Contact) (& $20.); actually worked for me and I don't know why, but I learned a second thing(:$20.);

whichc(put(left(Contact_Type),$upcase3.), 'HOM', 'MOB', 'CEL') is quite handy.

Thank you,

Retrieving data within by group

Re: Retrieving data within by group

Re: Retrieving data within by group

Re: Retrieving data within by group

Re: Retrieving data within by group

Re: Retrieving data within by group

Re: Retrieving data within by group

Re: Retrieving data within by group

Re: Retrieving data within by group

SAS Innovate 2025: Call for Content

Classroom Training Available!