Hello all,
I would like to create subset of observations based on a variable limiting the count to 2 or 1 depending on the value in that variable.
data want;
infile want_file;
input
@ 001 ln_no 9.
@ 010 ln_status $1.
@ 011 ln_excep_stat $1
;
ln_status can have values of '1', 2', '3', '4' or '5' ;
ln_excep_stat can have values of '1', '2', '3' or '4';
If ln_excep_stat is '2', '3' or '4' and ln_stat is '2', '3' or '4', need to limit the observations to 2 obs if there are more than 2 per ln_no.
If ln_excep_stat is '1' or '5' and ln_stat is '1' or '5', need to limit to 1 if there are more than 2.
I really appreciate your time and effort on this.
thank you in advance,
Neal.
@saslovethemost wrote:
for the 1 record, I want the bottom one, for the two records, I want to have bottom two records.
Your example above was incorrect then or is there some other variable that indicates order?
Either way, here's one long winded approach, yes there are easier ways.
ata cobdla;
set cobdla;
by ln_no;
if first.ln_no then count=1;
else count+1;
run;
proc sort data=cobdla;
by ln_no descending count;
run;
data want;
set cobdla;
by ln_no;
retain max;
if first.ln_no then max=count;
if (ln_status in ('2','3','4') or ln_excep_stat in ('2','3','4')) and count >= (max - 1) then flag=1;
else if (ln_status in ('1', '5') or ln_excep_stat in ('1','5')) and count=max then flag=2;
else flag=3;
if flag in (1,2);
run;
Please provide the data, using these instructions: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Do not provide the data in any other form.
Also, show us the code you have tried already.
data combdla;
infile datalines dsd truncover;
input ln_no:9. ln_status:$1. ln_excep_stat:$1. name:$4.;
datalines;
999114964 2 3 NEAL
999114964 2 3 DAVE
999114964 2 3 PHIL
;
data combdlq;
set combdla;
if ln_status=('2' or '3' or '4') and
ln_excep_stat=('2' or '3' or '4') then
do;
noobs=2;
end;
else ln_status=('1' or '5') and
ln_excep_stat =('1' or '5')
do;
noobs=2;
end;
;
I have tried the above code, not getting the right results. is there a way that SAS can do it in a different way than structured language like COBOL?
sorry, in the second data statement should be like the following:
data combdlq;
set combdla;
if ln_status=('2' or '3' or '4') and
ln_excep_stat=('2' or '3' or '4') then
do;
noobs=2;
end;
else if ln_status=('1' or '5') and
ln_excep_stat =('1' or '5')
do;
noobs=1;
end;
else
do;
noobs =1;
end;
;
It's still not clear to me what the requirement "need to limit the observations to 2 obs if there are more than 2 per ln_no" means. Show us the desired output.
It's very tough to generalize from one example, so that code can be written that works in all cases. Are there rules that determine why you pick NEAL and DAVE in this case, and not two other records?
data cobdla;
infile datalines dsd truncover;
input ln_no:9. ln_status:$1. ln_excep_stat:$1. name:$4.;
datalines;
999114961 2 3 NEAL
999114961 2 3 DAVE
999114961 2 3 PHIL
999114962 3 4 JOHN
999114962 3 4 RICK
999114962 3 4 LILI
999114963 4 5 JOHN
999114963 4 5 RICK
999114963 4 5 LILI
999114964 1 5 JOHN
999114964 1 5 RICK
;
sorry for the confusion I created,
Here is what I want: last two records for each ln_no if the ln_status and ln_excep_stat is in '2' '3' or '4', if they are '1' and '5' I want only 1 record.
A hint for using in-line data in a data step. Do not indent the data. Figuring out what column to read what column is causing errors, editing etc. will be much harder. To make it easier also do not indent the DATALINES (or CARDS) statement that comes before that data or the semi-colon line (or four semi-colon line) that marks the end of the data (and the end of the data step).
For the 1 record, does it matter which one? I noticed for the loan status you said last two but in your example here you actually picked the first two. Can you confirm? Its much easier if it's the first two rather than last two - because you don't know how many there are until you're done.
for the 1 record, I want the bottom one, for the two records, I want to have bottom two records.
@saslovethemost wrote:
for the 1 record, I want the bottom one, for the two records, I want to have bottom two records.
Your example above was incorrect then or is there some other variable that indicates order?
Either way, here's one long winded approach, yes there are easier ways.
ata cobdla;
set cobdla;
by ln_no;
if first.ln_no then count=1;
else count+1;
run;
proc sort data=cobdla;
by ln_no descending count;
run;
data want;
set cobdla;
by ln_no;
retain max;
if first.ln_no then max=count;
if (ln_status in ('2','3','4') or ln_excep_stat in ('2','3','4')) and count >= (max - 1) then flag=1;
else if (ln_status in ('1', '5') or ln_excep_stat in ('1','5')) and count=max then flag=2;
else flag=3;
if flag in (1,2);
run;
Thanks Reeza for the reply., please see the reply I sent to Tom.
I pasted it below for your reference.
"Thanks Tom for the reply, sorry I misguided you with my code, but I do not want the no of obs values, I want the actual data mentioned in the datalines but subset of it. For a ln_no if there are more than 2 records, I want to retain last 2 records for ln_stat and ln_excep_status '2' or '3' or '4' . same way, I want to retain 1 record if ln_stat and ln_excep_status '1' or '5' if there are more than 1 record. If there is only one record, I want to retain that one record. sorry for all the confusion".
@saslovethemost wrote:
Thanks Reeza for the reply., please see the reply I sent to Tom.
I pasted it below for your reference.
"Thanks Tom for the reply, sorry I misguided you with my code, but I do not want the no of obs values, I want the actual data mentioned in the datalines but subset of it. For a ln_no if there are more than 2 records, I want to retain last 2 records for ln_stat and ln_excep_status '2' or '3' or '4' . same way, I want to retain 1 record if ln_stat and ln_excep_status '1' or '5' if there are more than 1 record. If there is only one record, I want to retain that one record. sorry for all the confusion".
That's exactly what my code does. Did you run it and check the output?
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.