DATA Step, Macro, Functions and more

Create new variable based on first occurrence in data set

Reply
New Contributor
Posts: 2

Create new variable based on first occurrence in data set

New to SAS and this site

I have a data set that I need code to help pull back first relationships manager based on the account number.

Example data set:
Date ACCOUNT RM
1-1-18 123 Gabe
1-2-18. 123 Gabe
1-3-18. 123. Mary

New variable would pull back ‘Gabe’ in each row.

Data set is large and will account for each day of the year and 1000’s of accounts.

Thanks in advance
Super User
Posts: 23,224

Re: Create new variable based on first occurrence in data set

Posted in reply to ColonelSanders

What does 'pull back' mean?

 

What are you looking for as output?

If you have multiple accounts it would be helpful if you included more than one in your example.

 


@ColonelSanders wrote:
New to SAS and this site

I have a data set that I need code to help pull back first relationships manager based on the account number.

Example data set:
Date ACCOUNT RM
1-1-18 123 Gabe
1-2-18. 123 Gabe
1-3-18. 123. Mary

New variable would pull back ‘Gabe’ in each row.

Data set is large and will account for each day of the year and 1000’s of accounts.

Thanks in advance

 

New Contributor
Posts: 2

Re: Create new variable based on first occurrence in data set

Posted in reply to ColonelSanders
Wow that was fast and Sorry having to use my cell as site is blocked on work pc so made the ask to brief.

In the end I would like the following result. Keeping the data set as is with a new variable or ‘Original RM’ being added.

Date ACCOUNT RM Original RM
1-1-18 123 Gabe Gabe
1-2-18. 123 Gabe Gabe
1-3-18 123 Mary Gabe
1-1-18 234 Bob Bob
1-2-18 234 Bob Bob
1-3-18 234 Bob Bob
1-1-18 789 Sam Sam
1-2-18 789 Mike Sam
1-3-18 789 Betty Sam

I would be using this report to capture who added the account originaly not who manages the acccount today.
Super User
Posts: 23,224

Re: Create new variable based on first occurrence in data set

Posted in reply to ColonelSanders

 

data want;
set have;
by account date;

retain original_RM;

if first.account then original_RM = RM;
run;

 

You need a couple of different options here: 

1. BY group processing to identify the FIRST. of each account (first.account)

2. RETAIN to keep the value until it's reset. 

3. I included date in the BY to ensure the first date is used, but this also means you need to sort the data first.

 


@ColonelSanders wrote:
Wow that was fast and Sorry having to use my cell as site is blocked on work pc so made the ask to brief.

In the end I would like the following result. Keeping the data set as is with a new variable or ‘Original RM’ being added.

Date ACCOUNT RM Original RM
1-1-18 123 Gabe Gabe
1-2-18. 123 Gabe Gabe
1-3-18 123 Mary Gabe
1-1-18 234 Bob Bob
1-2-18 234 Bob Bob
1-3-18 234 Bob Bob
1-1-18 789 Sam Sam
1-2-18 789 Mike Sam
1-3-18 789 Betty Sam

I would be using this report to capture who added the account originaly not who manages the acccount today.


 

Super User
Super User
Posts: 9,384

Re: Create new variable based on first occurrence in data set

Posted in reply to ColonelSanders

Something like:

proc sort data=have out=first;
  by account date;
run;
data first (keep=account rm_first);
  set first (rename=(rm_original=rm_first));
  by account;
  if first.account;
run;
data want;
  merge have first;
  by account;
run;

Note, post test data in the form of a datastep in future, I am not here to type out test data, hence the above is not tested.

Trusted Advisor
Posts: 1,309

Re: Create new variable based on first occurrence in data set

Posted in reply to ColonelSanders

Is the data already sorted by ACCOUNT/DATE?  If so, then:

 

data want;
  set have (drop=rm);
  by account;
  if first.account then set have (keep=rm) point=_n_;
run;

 

 

  1. The first SET statement reads each incoming record from HAVE, but does not read in variable RM.  As a result RM IS NOT REPLACED by that SET statement.
  2. The BY ACCT statement tells sas (a) to expect the incoming data to be sorted, and (b) create a dummy variable (first.account and last.account) indicating whether the record-in-hand is the start or end of a BY-group.
  3. The IF first.account then set have reads in only one variable (RM) and does not read sequentially, but points to observation _N_ of HAVE.  And _N_ is an automatic variable that counts iterations of the DATA step, which in this case is also a count of observations in HAVE.

 

Take home message: Make the program-data-vector (better known as PDV) your friend.  PDV is a list of variables in the data step.  All but one of those variables are replaced with the first set.  And the other one (RM) is replaced only occasionally.

Ask a Question
Discussion stats
  • 5 replies
  • 108 views
  • 4 likes
  • 4 in conversation