Using SAS 7.
I am quite new to SAS.
Please see attached file. I have a similar dateset.
For each unique name I want to extract all the records from the latest Amount Spent=X onwards, inclusive of the observation with the latest Amount Spent=X
Eg
John | 20160521 | 6 |
John | 20160101 | 50 |
John | 20150526 | X |
How do I do so?
Thank you
Hi,
Its a good idea to put test data as a datastep rather than attaching files. So this is just off the top of my head:
data want;
set have;
retain o;
by name;
if first.name then o=0; /* o is a flag 0=not output, 1 to output */
if value="X" then o=1;
if o then output;
run;
Hi
Thank you for the reply.
But I don't think you looked at the file.
Example Dataset
Name | Date | Amount Spent |
John | 20160521 | 6 |
John | 20160101 | 50 |
John | 20150526 | X |
John | 20150329 | 23 |
John | 20150215 | X |
John | 20141212 | 10 |
Peter | 20151203 | 150 |
Peter | 20150321 | X |
Peter | 20141123 | 20 |
Albert | 20150408 | X |
The output that I want
Name | Date | Amount Spent |
John | 20160521 | 6 |
John | 20160101 | 50 |
John | 20150526 | X |
Peter | 20151203 | 150 |
Peter | 20150321 | X |
Albert | 20150408 | X |
The code you provided will only extract the observations with X, and after X.
Thank you
Well, minor change to the code then (and no, I dont look at attachments - post test data in the form of a datastep):
data want; set have; retain o; by name; if first.name then o=1; if value="X" then o=0; /* If X is encountered, stop outputting */ if o then output; run;
Hi @apple,
Here is another approach:
/* Create test data, incl. a name without Amount_Spent=.X */
data have;
missing x;
input Name $ Date :yymmdd. Amount_Spent;
format date yymmddn8.;
cards;
John 20160521 6
John 20160101 50
John 20150526 X
John 20150329 23
John 20150215 X
John 20141212 10
Peter 20151203 150
Peter 20150321 X
Peter 20141123 20
Albert 20150408 X
Jack 20141024 50
;
/* Add sequential number in case that sort order is to be preserved */
data temp;
set have;
seqno=_n_;
run;
proc sort data=temp;
by name descending date;
run;
/* Create target dataset up to sort order */
data want0;
do until(last.name);
set temp;
by name;
if amount_spent=.X then flagX=1;
end;
do until(last.name);
set temp;
by name descending date;
if first.name then flag=1;
if flagX & flag then output;
if amount_spent=.X then flag=0;
end;
drop flag:;
run;
proc sort data=want0 out=want(drop=seqno);
by seqno;
run;
proc print data=want;
run;
Edit: Please note that I interpreted your "X" values as special missing values. If I were you, I would strongly prefer to define Amount_Spent as a numeric variable, because it contains numbers, not "codes" consisting of digits. Character variables are not suitable for doing calculations.
Is there a possibility that one person has spent two amounts on the same date, maybe one .X and one non-missing? What would be the selection criterion in this case?
Edit 2: Simplified definition of variable FLAG (if first.name ...).
data have;
infile cards dsd;
informat name $6. date yymmdd8. amount_spent 8.;
format name $6. date yymmddn8. amount_spent 8.;
input Name Date Amount_Spent;
cards;
John,20160521,6
John,20160101,50
John,20150526,.
John,20150329,23
John,20150215,.
John,20141212,10
Peter,20151203,150
Peter,20150321,.
Peter,20141123,20
Albert,20150408,.
;run;
proc sql;
create table want as
select *
from have where date >= (select max(date)
from have
where missing(amount_spent));
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.