I have to create a code using the dateset which is already in my sas library .
if
Count ( account_number) > 1 & Distinct (email_address) & Distinct ( Timestamp)
then V_timestamp = max (Timestamp) ;
else
V_timestamp = ' ' ;
/*condition says that if an account is coming more than 1 with distinct email ids and timestamp then it should pick an account that have latest timestamp and remove the rest */
please help me as in my SAS catalog there is no format mentioned for field timestamp so i see blank in content file.
Also shd we include email address format too?
can you pls help me built a logic on the above condition explained.
HELP will be APPRECIATED 🙂
FYI : I am a novice user.
You misunderstand what the LAST. variable does. By using last.timestamp, you will get every distinct timestamp once in your output dataset, as you use timestamp as the lowest-level group variable here. Since you only want the last timestamp within an account, you need to use last.account on a dataset sorted by account and timestamp.
My impression is that, for any account, you want to only keep the latest email address as the valid one.
This can be done with SORT and DATA steps:
proc sort data=have;
by account timetstamp;
run;
data want;
set have;
by account;
if last.account;
run;
or, with SQL:
proc sql;
create table want as
select distinct *
from have
group by account
having timestamp = max(timestamp)
;
quit;
The DISTINCT can be omitted if there are no completely duplicate entries for any given account.
You are asking for " help me built a logic on the above condition explained."
The logic should include next steps:
1) sort the input dataset by account_number email_address timestamp;
2) Subset from the sorted dataset the wanted observations using a code like:
data want;
set sorted;
by account_number email_address timestamp;
if last.email_address;
run;
You misunderstand what the LAST. variable does. By using last.timestamp, you will get every distinct timestamp once in your output dataset, as you use timestamp as the lowest-level group variable here. Since you only want the last timestamp within an account, you need to use last.account on a dataset sorted by account and timestamp.
My impression is that, for any account, you want to only keep the latest email address as the valid one.
This can be done with SORT and DATA steps:
proc sort data=have;
by account timetstamp;
run;
data want;
set have;
by account;
if last.account;
run;
or, with SQL:
proc sql;
create table want as
select distinct *
from have
group by account
having timestamp = max(timestamp)
;
quit;
The DISTINCT can be omitted if there are no completely duplicate entries for any given account.
When you sort by account and timestamp, last.account indicates the latest for that account. last.timestamp indicates the last for that particular timestamp, which would only make sense if you want to filter out duplicate timestamp entries.
PS all my previous posts in this thread make the assumption that datetime values are stored as such, and not as strings; see your other thread.
And a SQL version:
proc sql;
create table want as
select account_number, email_address, timestamp
group by account_number, email_address
having timestamp = max(timestamp)
;
quit;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.