BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
librasonali
Quartz | Level 8


I have to create a code using the dateset which is already in my sas library .

 

if 

Count ( account_number) > 1 & Distinct (email_address) & Distinct ( Timestamp)  

then V_timestamp = max (Timestamp) ;

else 

V_timestamp = ' ' ;

/*condition says that if an account is coming more than 1 with distinct email ids and timestamp then it should pick an account that have latest timestamp and remove the rest */

 

please help me as in my SAS catalog there is no format mentioned for field timestamp so i see blank in content file. 

Also shd we include email address format too?

can you pls help me built a logic on the above condition explained. 

HELP will be APPRECIATED 🙂  
FYI : I am a novice user.  

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You misunderstand what the LAST. variable does. By using last.timestamp, you will get every distinct timestamp once in your output dataset, as you use timestamp as the lowest-level group variable here. Since you only want the last timestamp within an account, you need to use last.account on a dataset sorted by account and timestamp.

 

My impression is that, for any account, you want to only keep the latest email address as the valid one.

This can be done with SORT and DATA steps:

proc sort data=have;
by account timetstamp;
run;

data want;
set have;
by account;
if last.account;
run;

or, with SQL:

proc sql;
create table want as
  select distinct *
  from have
  group by account
  having timestamp = max(timestamp)
;
quit;

The DISTINCT can be omitted if there are no completely duplicate entries for any given account.

View solution in original post

9 REPLIES 9
Shmuel
Garnet | Level 18

You are asking for " help me built a logic on the above condition explained."

 

The logic should include next steps:

1) sort the input dataset by account_number email_address timestamp;

2) Subset from the sorted dataset the wanted observations using a code like:

data want;
 set sorted;
  by  account_number email_address timestamp;
      if last.email_address;
run;
librasonali
Quartz | Level 8
thanks for the reply !
data want;
set sorted;
by account_number email_address timestamp;
if last.timestamp; /* i changed this as i want accounts with latest timestamps*/
run;
Also i want this logic when an account count is gr 1 i.e when A1 account is coming more than 1 times with distinct (email ids and timestamps) then i want last.timestamp .


Kurt_Bremser
Super User

You misunderstand what the LAST. variable does. By using last.timestamp, you will get every distinct timestamp once in your output dataset, as you use timestamp as the lowest-level group variable here. Since you only want the last timestamp within an account, you need to use last.account on a dataset sorted by account and timestamp.

 

My impression is that, for any account, you want to only keep the latest email address as the valid one.

This can be done with SORT and DATA steps:

proc sort data=have;
by account timetstamp;
run;

data want;
set have;
by account;
if last.account;
run;

or, with SQL:

proc sql;
create table want as
  select distinct *
  from have
  group by account
  having timestamp = max(timestamp)
;
quit;

The DISTINCT can be omitted if there are no completely duplicate entries for any given account.

librasonali
Quartz | Level 8
hey !
i donot want the last i want the latest, eg if an Account A1 have 2 email ids i.e s e1 have timestamp 12/22/2020 12:00:00 and e2 is 11/20/2020 10:00:00 then it should pick E1 email id that is the latest one ,
Kurt_Bremser
Super User

When you sort by account and timestamp, last.account indicates the latest for that account. last.timestamp indicates the last for that particular timestamp, which would only make sense if you want to filter out duplicate timestamp entries.

librasonali
Quartz | Level 8
thank you 🙂 !
LinusH
Tourmaline | Level 20

And a SQL version:

proc sql;
   create table want as
      select account_number, email_address, timestamp
         group by account_number, email_address
         having timestamp = max(timestamp)
   ;
quit;
Data never sleeps
librasonali
Quartz | Level 8
thank you for the reply !
i have already applied this approach
libname a SAS library
proc sql;

select count(distinct(account_number) >1 , distinct(email_address), distinct(timestamp) from a.dataset
group by account_number, email_address
having timestamp = max(timestamp);
quit;

will this fine ? as i was using this approach was not getting expecting.. so was trying with SAS

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1673 views
  • 3 likes
  • 4 in conversation