Hi
I am working in a bank.
We have a data set with scores information for every customer.
Score for each customer can be any integer between 0 to 11.(11 is a failure status).
I created a vector of last 12 scores for each customer(with comma between values).
I have 3 questions:
Question1-Calculate How many non-missing arguments exits in the vector.
For example:
7,7,9,10,9,11,11,11,11,10,11,9 Here there are 12
3,.,.,.,.,.,0,0,0,0,6,7 Here there are 7
Question2-Calculate How many missing+non-missing arguments exits in the vector.
For example:
7,7,9,10,9,11,11,11,11,10,11,9 Here there are 12
3,.,.,.,.,.,0,0,0,0,6,7 Here there are 12
Question3-Calculate How many times 11 appear
For example:
7,7,9,10,9,11,11,11,11,10,11,9 Here there are 5
3,.,.,.,.,.,0,0,0,0,6,7 Here there are 0
Question4-Calculate location of first failure(11)
For example:
7,7,9,10,9,11,11,11,11,10,11,9 Here we need to get:6
3,.,.,.,.,.,0,0,0,0,6,7 Here we need to get:0
Data tbl1;
input ID Score1 Score2 Score3 Score4 Score5 Score6
Score7 Score8 Score9 Score10 Score11 Score12;
cards;
1 7 8 9 10 9 11 11 11 11 10 11 9
2 8 8 7 7 7 7 6 7 7 8 7 8
3 3 3 3 3 2 2 2 3 3 3 2 2
3 . . . . . 0 0 0 0 6 7
;
run;
Data want;
length ScoreVector1 ScoreVector2 ScoreVector3 ScoreVector4 $32.;
set tbl1;
ScoreVector1=compress(Score1)||','|| compress(Score1)||','|| compress(Score3)
||','|| compress(Score4)||','|| compress(Score5)||','|| compress(Score6)
||','|| compress(Score7)||','|| compress(Score8)||','|| compress(Score9)
||','|| compress(Score10)||','|| compress(Score11)||','|| compress(Score12);
Run;
Hi
I am working in a bank.
We have a data set with scores information for every customer.
Score for each customer can be any integer between 0 to 11.(11 is a failure status).
I created a vector of last 12 scores for each customer(with comma between values).
I have 3 questions:
Question1-Calculate How many non-missing arguments exits in the vector.
For example:
7,7,9,10,9,11,11,11,11,10,11,9 Here there are 12
3,.,.,.,.,.,0,0,0,0,6,7 Here there are 7
Question2-Calculate How many missing+non-missing arguments exits in the vector.
For example:
7,7,9,10,9,11,11,11,11,10,11,9 Here there are 12
3,.,.,.,.,.,0,0,0,0,6,7 Here there are 12
Question3-Calculate How many times 11 appear
For example:
7,7,9,10,9,11,11,11,11,10,11,9 Here there are 5
3,.,.,.,.,.,0,0,0,0,6,7 Here there are 0
Question4-Calculate location of first failure(11)
For example:
7,7,9,10,9,11,11,11,11,10,11,9 Here we need to get:6
3,.,.,.,.,.,0,0,0,0,6,7 Here we need to get:0
Data tbl1;
input ID Score1 Score2 Score3 Score4 Score5 Score6
Score7 Score8 Score9 Score10 Score11 Score12;
cards;
1 7 8 9 10 9 11 11 11 11 10 11 9
2 8 8 7 7 7 7 6 7 7 8 7 8
3 3 3 3 3 2 2 2 3 3 3 2 2
3 . . . . . 0 0 0 0 6 7
;
run;
Data tbl2;
length ScoreVector $32.;
set tbl1;
ScoreVector=compress(Score1)||','|| compress(Score1)||','|| compress(Score3)
||','|| compress(Score4)||','|| compress(Score5)||','|| compress(Score6)
||','|| compress(Score7)||','|| compress(Score8)||','|| compress(Score9)
||','|| compress(Score10)||','|| compress(Score11)||','|| compress(Score12);
Run;
In general, it is better to keep data points in their own variables. In this instance it may actually help, but you could do the same thing with an array and catx, e.g:
data want; length scorevector1 $2000; set tbl1; scorevector1=catx(',',score1,score2,score3...score12); run;
Note I use consistent case, the code window (its the {i} above post area) and indentation.
For your questions:
1) Note, this assumes the variable are in order, score1, score2 etc. If not use and array and replace score1--score12 with of <arrayname>{*}.
data want; set tbl1; num_miss=nmiss(score1--score12); run;
2) I don't get this question, the answer is always 12?
3)
data want; set tbl1; num_11=count(catx(',',score1--score12),"11"); run;
4) Get this by using the index function.
These are all basic string manipulation functions which you can find in the SAS manual:
Make a point of reading the manual.
Hi and thank you so much for your reply.
In my question 2 we need to get 12 for all rows because in all rows there are 12 arguments (numbers).
How can we calculate it please?
num_arg=12;
Why would you want to calculate something that is fixed and known, just a waste of processing. If you still want to use the countw() function which you can find in the docs.
If working with a wide table structure then I'd be using proper SAS array processing for this as shown below.
Data tbl1_wide tbl1_long;
infile datalines dlm=' ' truncover;
input ID Score1 Score2 Score3 Score4 Score5 Score6
Score7 Score8 Score9 Score10 Score11 Score12;
cards;
1 7 8 9 10 9 11 11 11 11 10 11 9
2 8 8 7 7 7 7 6 7 7 8 7 8
3 3 3 3 3 . 2 2 3 3 3 2 2
3 . . . . . 0 0 0 0 6 7
;
run;
data want;
set tbl1_wide;
array scores {*} Score1 - Score12;
/*Question1-Calculate How many non-missing arguments exits in the vector.*/
non_miss=n(of scores[*]);
/*Question2-Calculate How many missing+non-missing arguments exits in the vector.*/
miss_non_miss=dim(scores);
do _i=1 to dim(scores);
/*Question3-Calculate How many times 11 appear*/
n_11=sum(0 ,n_11 ,scores[_i]=11);
/*Question4-Calculate location of first failure(11)*/
if missing(pos_11) then
do;
if scores[_i]=11 then pos_11=_i;
end;
end;
run;
Hi
INDEX function returns the position of the first occurrence of the string’s first character.
I need a different thing.
I need to know in which position (1 until 12) was first failure (value11)
1 7 8 9 10 9 11 11 11 11 10 11 9 (Here the answer is 7)
2 8 8 7 7 7 7 6 7 7 8 7 11 (Here the answer is 12)
3 3 3 3 3 2 2 2 3 3 3 2 2 (Here the answer is 0)
4 . . . . . . 0 0 0 0 6 7 (Here the answer is 0)
data want; set tbl1; num_11=findw(catx(',',score1--score12),"11"); run;
That should find first occurence of word.
@Ronein wrote:
Hi
INDEX function returns the position of the first occurrence of the string’s first character.
I need a different thing.
I need to know in which position (1 until 12) was first failure (value11)
1 7 8 9 10 9 11 11 11 11 10 11 9 (Here the answer is 7)
2 8 8 7 7 7 7 6 7 7 8 7 11 (Here the answer is 12)
3 3 3 3 3 2 2 2 3 3 3 2 2 (Here the answer is 0)
4 . . . . . . 0 0 0 0 6 7 (Here the answer is 0)
If these values are numeric and you will be numeric calculations later I strongly suggest looking at @Patrick's array approach.
An alternate of finding first value would be the WHICHN function (referencing Patricks array definition)
First11 = whichn(11, of scores(*));
which does return 0 when the value in the first position is not found in any of the array elements.
Quote:
"I created a vector of last 12 scores for each customer(with comma between values)."
Since you are in command, I strongly (and I mean STRONGLY) recommend to use the correct data structure for this, creating a long dataset with 12 observations with a single variable. Then most of your calculations can be done by simple counts in SQL or data steps or with proc summary.
If you add time-point variables to it, you gain by simply not writing a particular observation when the value is missing.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.