DATA Step, Macro, Functions and more

Calculation from a string

Reply
Frequent Contributor
Posts: 143

Calculation from a string

Hi

I am working in a bank.

We have a data set with scores information for every customer.

Score for each customer can be any integer between 0 to 11.(11 is a failure status).

I created a vector of last 12 scores for each customer(with comma between values).

I have 3 questions:

 

Question1-Calculate How many non-missing arguments exits in the vector.

For example:

7,7,9,10,9,11,11,11,11,10,11,9     Here there are 12

3,.,.,.,.,.,0,0,0,0,6,7     Here there are 7

 

Question2-Calculate How many missing+non-missing arguments exits in the vector.

For example:

7,7,9,10,9,11,11,11,11,10,11,9     Here there are 12

3,.,.,.,.,.,0,0,0,0,6,7     Here there are 12

 

Question3-Calculate How many  times 11 appear

For example:

7,7,9,10,9,11,11,11,11,10,11,9     Here there are 5

3,.,.,.,.,.,0,0,0,0,6,7     Here there are 0

 

Question4-Calculate location of first failure(11)

For example:

7,7,9,10,9,11,11,11,11,10,11,9     Here we need to get:6

3,.,.,.,.,.,0,0,0,0,6,7   Here we need to get:0

 

 

Data tbl1;
input ID Score1 Score2 Score3 Score4 Score5 Score6
Score7 Score8 Score9 Score10 Score11 Score12;
cards;
1 7 8 9 10 9 11 11 11 11 10 11 9
2 8 8 7 7 7 7 6 7 7 8 7 8
3 3 3 3 3 2 2 2 3 3 3 2 2
3 . . . . . 0 0 0 0 6 7
;
run;

 

 

Data want;
length ScoreVector1 ScoreVector2 ScoreVector3 ScoreVector4 $32.;
set tbl1;
ScoreVector1=compress(Score1)||','|| compress(Score1)||','|| compress(Score3)
||','|| compress(Score4)||','|| compress(Score5)||','|| compress(Score6)
||','|| compress(Score7)||','|| compress(Score8)||','|| compress(Score9)
||','|| compress(Score10)||','|| compress(Score11)||','|| compress(Score12);
Run;

Frequent Contributor
Posts: 143

calculations from a string

Hi

I am working in a bank.

We have a data set with scores information for every customer.

Score for each customer can be any integer between 0 to 11.(11 is a failure status).

I created a vector of last 12 scores for each customer(with comma between values).

I have 3 questions:

 

Question1-Calculate How many non-missing arguments exits in the vector.

For example:

7,7,9,10,9,11,11,11,11,10,11,9     Here there are 12

3,.,.,.,.,.,0,0,0,0,6,7     Here there are 7

 

Question2-Calculate How many missing+non-missing arguments exits in the vector.

For example:

7,7,9,10,9,11,11,11,11,10,11,9     Here there are 12

3,.,.,.,.,.,0,0,0,0,6,7     Here there are 12

 

Question3-Calculate How many  times 11 appear

For example:

7,7,9,10,9,11,11,11,11,10,11,9     Here there are 5

3,.,.,.,.,.,0,0,0,0,6,7     Here there are 0

 

Question4-Calculate location of first failure(11)

For example:

7,7,9,10,9,11,11,11,11,10,11,9     Here we need to get:6

3,.,.,.,.,.,0,0,0,0,6,7   Here we need to get:0

 

 

Data tbl1;
input ID Score1 Score2 Score3 Score4 Score5 Score6
Score7 Score8 Score9 Score10 Score11 Score12;
cards;
1 7 8 9 10 9 11 11 11 11 10 11 9
2 8 8 7 7 7 7 6 7 7 8 7 8
3 3 3 3 3 2 2 2 3 3 3 2 2
3 . . . . . 0 0 0 0 6 7
;
run;

 

 

Data tbl2;
length  ScoreVector  $32.;
set tbl1;
ScoreVector=compress(Score1)||','|| compress(Score1)||','|| compress(Score3)
||','|| compress(Score4)||','|| compress(Score5)||','|| compress(Score6)
||','|| compress(Score7)||','|| compress(Score8)||','|| compress(Score9)
||','|| compress(Score10)||','|| compress(Score11)||','|| compress(Score12);
Run;

Super User
Super User
Posts: 9,599

Re: Calculation from a string

[ Edited ]

In general, it is better to keep data points in their own variables.  In this instance it may actually help, but you could do the same thing with an array and catx, e.g:

data want;
  length scorevector1 $2000;
  set tbl1;
  scorevector1=catx(',',score1,score2,score3...score12);
run;

Note I use consistent case, the code window (its the {i} above post area) and indentation.

For your questions:

1) Note, this assumes the variable are in order, score1, score2 etc.  If not use and array and replace score1--score12 with of <arrayname>{*}.

data want;
  set tbl1;
  num_miss=nmiss(score1--score12);
run;

2) I don't get this question, the answer is always 12?  

3) 

data want;
  set tbl1;
  num_11=count(catx(',',score1--score12),"11");
run;

4) Get this by using the index function.

 

These are all basic string manipulation functions which you can find in the SAS manual:

http://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=titlepage.htm&docsetVersion=9.4&l...

Make a point of reading the manual.

 

Frequent Contributor
Posts: 143

Re: Calculation from a string

Hi and thank you so much for your reply.

In my question 2 we need to get 12 for all rows because in all rows there are 12 arguments (numbers).

How can we calculate it please?

 

 

Super User
Super User
Posts: 9,599

Re: Calculation from a string

num_arg=12;

Why would you want to calculate something that is fixed and known, just a waste of processing.  If you still want to use the countw() function which you can find in the docs.

Respected Advisor
Posts: 4,743

Re: Calculation from a string

@Ronein

If working with a wide table structure then I'd be using proper SAS array processing for this as shown below.

Data tbl1_wide tbl1_long;
  infile datalines dlm=' ' truncover;
  input ID Score1 Score2 Score3 Score4 Score5 Score6
    Score7 Score8 Score9 Score10 Score11 Score12;
  cards;
1 7 8 9 10 9 11 11 11 11 10 11 9
2 8 8 7 7 7 7 6 7 7 8 7 8
3 3 3 3 3 . 2 2 3 3 3 2 2
3 . . . . . 0 0 0 0 6 7
;
run;

data want;
  set tbl1_wide;
  array scores {*} Score1 - Score12;

  /*Question1-Calculate How many non-missing arguments exits in the vector.*/
  non_miss=n(of scores[*]);

  /*Question2-Calculate How many missing+non-missing arguments exits in the vector.*/
  miss_non_miss=dim(scores);

  do _i=1 to dim(scores);
    /*Question3-Calculate How many  times 11 appear*/
    n_11=sum(0 ,n_11 ,scores[_i]=11);

    /*Question4-Calculate location of first failure(11)*/
    if missing(pos_11) then
      do;
        if scores[_i]=11 then pos_11=_i;
      end;
  end;

run;
Frequent Contributor
Posts: 143

Re: Calculation from a string

Hi

INDEX function returns the position of the first occurrence of the string’s first character.

I need a different thing.

I need to know in which position   (1 until 12) was first failure (value11)

1 7 8 9 10 9 11 11 11 11 10 11 9    (Here the answer is 7)
2 8 8 7 7 7 7 6 7 7 8 7 11              (Here the answer is 12)
3 3 3 3 3 2 2 2 3 3 3 2 2                (Here the answer is 0)
4 . . . . . . 0 0 0 0 6 7                      (Here the answer is 0)

Super User
Super User
Posts: 9,599

Re: Calculation from a string

data want;
  set tbl1;
  num_11=findw(catx(',',score1--score12),"11");
run;

That should find first occurence of word. 

Super User
Posts: 13,583

Re: Calculation from a string


@Ronein wrote:

Hi

INDEX function returns the position of the first occurrence of the string’s first character.

I need a different thing.

I need to know in which position   (1 until 12) was first failure (value11)

1 7 8 9 10 9 11 11 11 11 10 11 9    (Here the answer is 7)
2 8 8 7 7 7 7 6 7 7 8 7 11              (Here the answer is 12)
3 3 3 3 3 2 2 2 3 3 3 2 2                (Here the answer is 0)
4 . . . . . . 0 0 0 0 6 7                      (Here the answer is 0)


If these values are numeric and you will be numeric calculations later I strongly suggest looking at @Patrick's array approach.

An alternate of finding first value would be the WHICHN function (referencing Patricks array definition)

 

First11 = whichn(11, of scores(*));

which does return 0 when the value in the first position is not found in any of the array elements.

 

Super User
Posts: 10,280

Re: Calculation from a string

Quote:

 

"I created a vector of last 12 scores for each customer(with comma between values)."

 

Since you are in command, I strongly (and I mean STRONGLY) recommend to use the correct data structure for this, creating a long dataset with 12 observations with a single variable. Then most of your calculations can be done by simple counts in SQL or data steps or with proc summary.

If you add time-point variables to it, you gain by simply not writing a particular observation when the value is missing.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Ask a Question
Discussion stats
  • 9 replies
  • 176 views
  • 1 like
  • 5 in conversation