DATA Step, Macro, Functions and more

Retrieving data within by group

Accepted Solution Solved
Reply
Regular Contributor
Posts: 218
Accepted Solution

Retrieving data within by group

Hi All,

I have the following table with people's contact information. for each by group I have their Home/Mobile/Email and Cellular no. I need to find the observation with contact_type = 'Home'. If the by group don't have a 'Home' contact type then the code will bring the contact_type = 'Mobile', if Mobile is missing then it will bring the contact_type = 'Cellular'. 

 

ID     Contact_Type   Contact

101  Home                 4434323454

101  Email                 abc@gmail.com

101  Mobile               8767876546

102  Mobile               8987609878

102  Email                 ghg@gmail.com

103  Cellular             6779998786

103  Email                aaa@gmail.com

 

Output will look like this:

 

ID     Contact_Type   Contact

101  Home                 4434323454

102  Mobile               8987609878

103  Cellular             6769998786

 

 

Can someone help me please. Thanks,


Accepted Solutions
Solution
‎03-18-2016 10:06 PM
Trusted Advisor
Posts: 1,117

Re: Retrieving data within by group

Hi @mlogan,

 

I can reassure you: "My" code (well, actually it's primarily @Reeza's with valuable input from @LinusH) works 

even if there is no email address available in a particular BY group (but only some of the other three contact types).

 

Okay, so you don't want to see email addresses in your output dataset. This is no problem and you already took the first step in adapting the code correspondingly: You removed 'Email' from the arguments of WHICHC.


As a consequence, however, any existing observation with Contact_Type='Email' (or with any other value of Contact_Type not listed in the arguments of WHICHC, for that matter) will be assigned priority 0. This zero, in turn, will always end up as the minimum priority in the BY group and hence the observation will be selected (even more than one such observation, if any).

 

To prevent this, we can simply amend the WHERE clause to restrict the selection to observations with non-zero (and non-missing) priority:

where not missing(Contact) & calculated priority

With this small change you will get your expected output from the new input data you posted. To read the raw data correctly, we have to correct your suggested INPUT statement, though (I assume the "&" was a typo):

input ID (Contact_type Contact) (:$20.);

 

Please note that values of Contact_Type not listed in WHICHC(...) could still cause unexpected results, as described above. This includes values such as 'HOME', 'mobile', 'Cell' or ' Home' (the latter with leading blank).

 

To make the code more robust, we could modify the function call as follows:

whichc(put(left(Contact_Type),$upcase3.), 'HOM', 'MOB', 'CEL')

This would handle the abovementioned spellings of contact types as well as the ordinary ones.

 

Finally, it should be mentioned that an ID with two or more observations with "top priority" (e.g. two different 'Mobile' numbers and no 'Home') would occur with all these observations in the output dataset.

View solution in original post


All Replies
Super User
Posts: 19,768

Re: Retrieving data within by group

So basically you have a priority. 

 

Code me the phone type to the priority level - you can use if/then, select, format or a merge/join. 

Sort by id and priority and use first.id to obtain the highest priority record. 

 

Untested but should get you started. 

Proc format;
InValue $ phone_type_fmt
'Home' = 1
'Mobile' = 2
....
Rest of your definitions;

Proc sql;
Create table temp as
Select a.*, input(contact_type, $phone_type_fmt.) as priority
From have
Where not missing(phone_number)
Order by id, priority;
Quit;

Then use a data step with first.id to obtain the highest priority number. 

 

Regular Contributor
Posts: 218

Re: Retrieving data within by group

Hi Reeza, Would you please write the first.id for procedure for me. I want the output like:

101 Home 4434323454
102 Mobile 8987609878
103 Cellular 6779998786

Thanks,


data have;
input ID (Contact_type Contact) ( & $20.);
cards;

101 Home 4434323454
101 Mobile 8767876546
102 Mobile 8987609878
102 Email ghg@gmail.com
103 Email aaa@gmail.com
103 Cellular 6779998786
104 Email sss@gmail.com
;
RUN

Proc format;
InValue $ Contact_type_fmt
'Home' = 1
'Mobile' = 2
'Cellular' = 3
'Email'= 4;

Proc sql;
Create table temp as
Select have.*, input(contact_type, $contact_type_fmt.) as priority
From have
Where not missing(Contact)
Order by id, priority;
Quit;
Super User
Posts: 5,424

Re: Retrieving data within by group

If you would like to do this in one step, just slightly modify @Reeza's SQL to include a group by, and a having priority = min(priority).
Data never sleeps
Trusted Advisor
Posts: 1,117

Re: Retrieving data within by group

Hi @mlogan,

 

To simplify the code further, you could use the WHICHC function rather than an informat to define variable PRIORITY:

proc sql;
create table want(drop=priority) as
select *, whichc(Contact_Type, 'Home', 'Mobile', 'Cellular', 'Email') as priority
from have
where not missing(Contact)
group by id
having priority=min(priority);
quit;
Regular Contributor
Posts: 218

Re: Retrieving data within by group

Posted in reply to FreelanceReinhard

Thanks Reinhard, your code works, but it doesn't work when 101 by group don't have an email and 104 has only email (see the following below). I want Home, Mobile and Cellular as priority order. If all are missing, then I don't want that observation (104 in this case).  Can you please help further. Thanks,

 

 

data have;
input ID (Contact_type Contact) ( & $20.);
cards;

101 Home 4434323454
101 Mobile 8767876546
102 Mobile 8987609878
102 Email ghg@gmail.com
103 Email aaa@gmail.com
103 Cellular 6779998786
104 Email sss@gmail.com
;
RUN


proc sql;
create table want(drop=priority) as
select *, whichc(Contact_Type, 'Home', 'Mobile', 'Cellular') as priority
from have
where not missing(Contact)
group by id
having priority=min(priority);
quit;

 

Folowing is my expected output:

 

101 Home 4434323454
102 Mobile 8987609878
103 Cellular 6779998786

Solution
‎03-18-2016 10:06 PM
Trusted Advisor
Posts: 1,117

Re: Retrieving data within by group

Hi @mlogan,

 

I can reassure you: "My" code (well, actually it's primarily @Reeza's with valuable input from @LinusH) works 

even if there is no email address available in a particular BY group (but only some of the other three contact types).

 

Okay, so you don't want to see email addresses in your output dataset. This is no problem and you already took the first step in adapting the code correspondingly: You removed 'Email' from the arguments of WHICHC.


As a consequence, however, any existing observation with Contact_Type='Email' (or with any other value of Contact_Type not listed in the arguments of WHICHC, for that matter) will be assigned priority 0. This zero, in turn, will always end up as the minimum priority in the BY group and hence the observation will be selected (even more than one such observation, if any).

 

To prevent this, we can simply amend the WHERE clause to restrict the selection to observations with non-zero (and non-missing) priority:

where not missing(Contact) & calculated priority

With this small change you will get your expected output from the new input data you posted. To read the raw data correctly, we have to correct your suggested INPUT statement, though (I assume the "&" was a typo):

input ID (Contact_type Contact) (:$20.);

 

Please note that values of Contact_Type not listed in WHICHC(...) could still cause unexpected results, as described above. This includes values such as 'HOME', 'mobile', 'Cell' or ' Home' (the latter with leading blank).

 

To make the code more robust, we could modify the function call as follows:

whichc(put(left(Contact_Type),$upcase3.), 'HOM', 'MOB', 'CEL')

This would handle the abovementioned spellings of contact types as well as the ordinary ones.

 

Finally, it should be mentioned that an ID with two or more observations with "top priority" (e.g. two different 'Mobile' numbers and no 'Home') would occur with all these observations in the output dataset.

Regular Contributor
Posts: 218

Re: Retrieving data within by group

Posted in reply to FreelanceReinhard
Hi Reinhard, Thanks for your help. it worked for me.
input ID (Contact_type Contact) (& $20.); actually worked for me and I don't know why, but I learned a second thing(:$20.);

whichc(put(left(Contact_Type),$upcase3.), 'HOM', 'MOB', 'CEL') is quite handy.

Thank you,
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 451 views
  • 9 likes
  • 4 in conversation