Using SAS 7.
I am quite new to SAS.
Please see attached file. I have a similar dateset.
For each unique name I want to extract all the records from the latest Amount Spent=X onwards, inclusive of the observation with the latest Amount Spent=X
Eg
John | 20160521 | 6 |
John | 20160101 | 50 |
John | 20150526 | X |
How do I do so?
Thank you
Hi,
Its a good idea to put test data as a datastep rather than attaching files. So this is just off the top of my head:
data want;
set have;
retain o;
by name;
if first.name then o=0; /* o is a flag 0=not output, 1 to output */
if value="X" then o=1;
if o then output;
run;
Hi
Thank you for the reply.
But I don't think you looked at the file.
Example Dataset
Name | Date | Amount Spent |
John | 20160521 | 6 |
John | 20160101 | 50 |
John | 20150526 | X |
John | 20150329 | 23 |
John | 20150215 | X |
John | 20141212 | 10 |
Peter | 20151203 | 150 |
Peter | 20150321 | X |
Peter | 20141123 | 20 |
Albert | 20150408 | X |
The output that I want
Name | Date | Amount Spent |
John | 20160521 | 6 |
John | 20160101 | 50 |
John | 20150526 | X |
Peter | 20151203 | 150 |
Peter | 20150321 | X |
Albert | 20150408 | X |
The code you provided will only extract the observations with X, and after X.
Thank you
Well, minor change to the code then (and no, I dont look at attachments - post test data in the form of a datastep):
data want; set have; retain o; by name; if first.name then o=1; if value="X" then o=0; /* If X is encountered, stop outputting */ if o then output; run;
Hi @apple,
Here is another approach:
/* Create test data, incl. a name without Amount_Spent=.X */
data have;
missing x;
input Name $ Date :yymmdd. Amount_Spent;
format date yymmddn8.;
cards;
John 20160521 6
John 20160101 50
John 20150526 X
John 20150329 23
John 20150215 X
John 20141212 10
Peter 20151203 150
Peter 20150321 X
Peter 20141123 20
Albert 20150408 X
Jack 20141024 50
;
/* Add sequential number in case that sort order is to be preserved */
data temp;
set have;
seqno=_n_;
run;
proc sort data=temp;
by name descending date;
run;
/* Create target dataset up to sort order */
data want0;
do until(last.name);
set temp;
by name;
if amount_spent=.X then flagX=1;
end;
do until(last.name);
set temp;
by name descending date;
if first.name then flag=1;
if flagX & flag then output;
if amount_spent=.X then flag=0;
end;
drop flag:;
run;
proc sort data=want0 out=want(drop=seqno);
by seqno;
run;
proc print data=want;
run;
Edit: Please note that I interpreted your "X" values as special missing values. If I were you, I would strongly prefer to define Amount_Spent as a numeric variable, because it contains numbers, not "codes" consisting of digits. Character variables are not suitable for doing calculations.
Is there a possibility that one person has spent two amounts on the same date, maybe one .X and one non-missing? What would be the selection criterion in this case?
Edit 2: Simplified definition of variable FLAG (if first.name ...).
data have;
infile cards dsd;
informat name $6. date yymmdd8. amount_spent 8.;
format name $6. date yymmddn8. amount_spent 8.;
input Name Date Amount_Spent;
cards;
John,20160521,6
John,20160101,50
John,20150526,.
John,20150329,23
John,20150215,.
John,20141212,10
Peter,20151203,150
Peter,20150321,.
Peter,20141123,20
Albert,20150408,.
;run;
proc sql;
create table want as
select *
from have where date >= (select max(date)
from have
where missing(amount_spent));
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.