DATA Step, Macro, Functions and more

grouping by 2 variables

Accepted Solution Solved
Reply
Contributor
Posts: 26
Accepted Solution

grouping by 2 variables

Hi,

 

I have the following dataset

 

data have;

input account_id customer_id removed_date added_date;
informat removed_date added_date  date9.;
format removed_date added_date  date9.;
cards;
12345 6789 11jun2017 .
12345 6789 . 01may2017
;
run

what I need is for the data to be displayed on row. I have multiple customer and account ID's and each customer can have multiple account id's and vice versa.

 

 

Many Thanks

 

 

Adnan


Accepted Solutions
Solution
‎04-04-2018 03:51 AM
Respected Advisor
Posts: 3,293

Re: grouping by 2 variables

Posted in reply to Adnan_Razaq

Yes, thank you, it is much clearer. Also, thank you for showing us your data in SAS DATA step code.

 

Try this code:

 

proc sql;
	create table both as select coalesce(a.account_id,b.account_id) as account_id,
		coalesce(a.customer_id,b.customer_id) as customer_id,a.removed_date,
		b.added_date 
		from removed as a full join added as b
	    on a.account_id=b.account_id and a.customer_id=b.customer_id and a.removed_date=b.added_date;
quit;
--
Paige Miller

View solution in original post


All Replies
Respected Advisor
Posts: 3,293

Re: grouping by 2 variables

[ Edited ]
Posted in reply to Adnan_Razaq

@Adnan_Razaq wrote:

Hi,

 

I have the following dataset

 

data have;

input account_id customer_id removed_date added_date;
informat removed_date added_date  date9.;
format removed_date added_date  date9.;
cards;
12345 6789 11jun2017 .
12345 6789 . 01may2017
;
run

what I need is for the data to be displayed on row. I have multiple customer and account ID's and each customer can have multiple account id's and vice versa.

 

 

Many Thanks

 

 

Adnan


It's really not clear what you are asking. "Displayed on row"??? I don't really know what you mean by this. Can you show us an example of what you want as the final output? Because from the poor understanding I have right now, PROC PRINT ought to display your data "on row", whatever that means.

 

Also, you need to show us examples of "I have multiple customer and account ID's and each customer can have multiple account id's and vice versa". And then show us the output from this situation where you "...have multiple customer and account ID's and each customer can have multiple account id's and vice versa".

--
Paige Miller
Contributor
Posts: 26

Re: grouping by 2 variables

Posted in reply to PaigeMiller

Apologies for the confusion.

 

I have 2 tables which contain dates which customers are added and removed from accounts.

data removed;

input account_id customer_id removed_date;
informat removed_date  date9.;
format removed_date   date9.;
cards;
1111 6789 01JAN2017 
1111 6789 31dec2017
;
run;


data added;

input account_id customer_id added_date;
informat added_date  date9.;
format added_date   date9.;
cards;
1111 6789 01jan2017 
1111 6789 03feb2017
;
run;

what I want to do is merge the tables by account id and customer ID to give me following output

 

 

Account ID  Customer ID    Removed_date       Added_Date

1111            6789                 01JAN2017           01JAN2017

1111            6789                 .                             03FEB2017

1111            6789                 31DEC2017

 

So if the added and removed dates are on the same date I would like to see the records on the same row (there can only be one instance of this per account and customer id). If the dates do no match I want to see each date on a separate row.

 

I hope this is clearer.

 

I require the output to identify when a customer gets added or removed from an account.in the instances of the same dates they will be deleted from the final output.

 

Regards

 

Adnan

Solution
‎04-04-2018 03:51 AM
Respected Advisor
Posts: 3,293

Re: grouping by 2 variables

Posted in reply to Adnan_Razaq

Yes, thank you, it is much clearer. Also, thank you for showing us your data in SAS DATA step code.

 

Try this code:

 

proc sql;
	create table both as select coalesce(a.account_id,b.account_id) as account_id,
		coalesce(a.customer_id,b.customer_id) as customer_id,a.removed_date,
		b.added_date 
		from removed as a full join added as b
	    on a.account_id=b.account_id and a.customer_id=b.customer_id and a.removed_date=b.added_date;
quit;
--
Paige Miller
Super User
Posts: 13,950

Re: grouping by 2 variables

Posted in reply to Adnan_Razaq

Please show what your output should look like. "Displayed on row" is not very concise and could be taken several ways.

 

If the idea is to have multiple columns of removed or added date for each account_id and customer_id you need to think what you will actually do with the resulting data. Varying numbers of valid values in that manner often lead to very difficult to process and maintain code as SAS data sets are NOT spread sheets. It sounds like you are going to attempt to do something later in a spreadsheet type process which is generally not the best way to work with SAS data sets.

 

How will you actually use the resulting data?

Trusted Advisor
Posts: 1,400

Re: grouping by 2 variables

Posted in reply to Adnan_Razaq

Here's a way to record the information you want, but uses a common date variable DATE.  Instead of maintaining values for added_date and removed_date, it creates a new variable DATE_TYPE (='A' for Added_date, 'R' for removed_date, or 'B' for both).

 

But if you really want the data layout you describe, use the same MERGE statement with the rename parameter as below, but instead of creating DATE_TYPE,  recreate added_date and removed_date:

 

data want;
  merge added   (rename=(added_date=date) in=inA)
        removed (rename=(removed_date=date) in=inR);
  if inA=1 and inR=1 then date_type='B' ; /*for both*/
  else if inA then date_type='A';
  else date_type='R';
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 128 views
  • 1 like
  • 4 in conversation