## first.statements with multiple variables

Solved
Frequent Contributor
Posts: 114

# first.statements with multiple variables

SAS Community,

Is anyone able to see what's wrong with my code below?  Essentially I would like to do the below logic, but I can't seem to figure out how to write this in proper SAS language.

proc sort data=have;

by Var_a Var_b;

data want;

set have;

by Var_a Var_b;

if first.Var_a and first.Var_b then Var_c+1;

run;

Thanks.

data have;

input Var_a Var_b;

0          1

0          2

0          3

1          2

1          3

2          3

3          3

4          1

4          2

4          3

run;

data want;

Var_a    Var_b    Var_c

0          1          1

0          2          1

0          3          1

1          2          2

1          3          2

2          3          3

3          3          4

4          1          2

4          2          3

4          3          5

Accepted Solutions
Solution
‎09-25-2012 03:20 PM
Super User
Posts: 8,127

## Re: first.statements with multiple variables

It looks to me like you just want to sort by var b and b2 and reset the counter when you see FIRST.B2.  Var_A is just an example of an extra variable that you want to carry along.  You can extent this logic to as many sort by variables as you want, you still just want to trigger the counter off of the change in the last variable in the list.

%let byvars=var_b var_b2;

proc sort data=have;

by &byvars;

run;

data want;

set have ;

by &byvars;

if first.%scan(&byvars,-1) then var_c=1;

else var_c+1;

run;

Obs    var_a    var_b    var_b2    want_c    var_c

1      0        1         1         1        1

2      4        1         1         2        2

3      0        1         2         1        1

4      4        1         2         2        2

5      0        2         1         1        1

6      1        2         1         2        2

7      4        2         1         3        3

8      0        3         1         1        1

9      1        3         1         2        2

10      2        3         1         3        3

11      3        3         2         1        1

12      0        3         5         1        1

13      3        3         5         2        2

14      4        3         5         3        3

15      0        3         6         1        1

All Replies
Posts: 3,167

## Re: first.statements with multiple variables

data want;

set have;

by var_a;

var_c+first.var_a;

run;

Haikuo

Posts: 1,318

## Re: first.statements with multiple variables

Maybe it is just me, but I do not understand what it is you are really trying to accomplish here.  Why do you believe your counter would reset to 2 on observation 8?

Frequent Contributor
Posts: 114

## Re: first.statements with multiple variables

The value is 2 because its only the 2nd time where Var_b=1 has been listed

Super User
Posts: 6,785

## Re: first.statements with multiple variables

Here's what I think you are after:

proc sort data=have;

by var_b var_a;

run;

data want;

set have;

by var_b;

if first.var_b then var_c=1;

else var_c + 1;

run;

Then re-sort if you would like to.

Good luck.

Posts: 1,318

## Re: first.statements with multiple variables

Okay, that was the piece of the explanation that was missing for me:

data have;

input Var_a Var_b;

cards;

0          1

0          2

0          3

1          2

1          3

2          3

3          3

4          1

4          2

4          3

;

run;

proc sort data=have;

by Var_b;

data want;

if 0 then set have; *to maintain column order;

do Var_c=1 by 1 until(last.Var_b);

set have;

by Var_b;

output;

end;

run;

proc sort data=want;

by Var_a Var_b;

run;

Frequent Contributor
Posts: 114

## Re: first.statements with multiple variables

This is very helpful, FriedEgg.  There's just one more piece that I didn't share as I didn't think it necessary and I was trying to keep things simple.  I see now I was wrong.

So there are different levels with Var_b.  See new dataset below.  How do I make sure that the Var_C does NOT count the different levels in Var_b_2.  (i.e. I only want the last.Var_b in each Var_a group to be counted towards Var_C.

Have I confused things more?

input Var_a Var_b Var_b_2;

cards;

0          1     1

0          1     2

0          2     1

0          3     1

0          3     5

0          3     6

1          2     1

1          3     1

2          3     1

3          3     2

3          3     5

4          1     1

4          1     2

4          2     1

4          3     5

Posts: 1,318

## Re: first.statements with multiple variables

proc sort data=have;

by Var_b Var_b_2;

run;

data want;

if 0 then set have;

do Var_c=1 by 1 until(last.Var_b);

do until(last.Var_b_2);

set have;

by Var_b Var_b_2;

output;

end;

end;

run;

Frequent Contributor
Posts: 114

## Re: first.statements with multiple variables

It's close but not exactly right as Var_C is changing based on Var_b_2 and Var_b does not impact Var_C.

So for example using the sample dataset posted previously:

data have;

input Var_a Var_b Var_b_2;

cards;

0          1     1

0          1     2

0          2     1

0          3     1

0          3     5

0          3     6

1          2     1

1          3     1

2          3     1

3          3     2

3          3     5

4          1     1

4          1     2

4          2     1

4          3     5

;

I should have the output below:

input Var_a Var_b Var_b_2 Var_c;

cards;

0          1     1    1

0          1     2    1

0          2     1     1

0          3     1    1

0          3     5    1

0          3     6    1

1          2     1    2

1          3     1    2

2          3     1    3

3          3     2    4

3          3     5    4

4          1     1     2

4          1     2    2

4          2     1    3

4          3     5    6

Frequent Contributor
Posts: 114

## Re: first.statements with multiple variables

*sorry I meant to type  *var_b_2 does not change

It's close but not exactly right as Var_C is changing based on Var_b_2 and *Var_b does not impact Var_C.

Posts: 3,167

## Re: first.statements with multiple variables

A minor tweak seems to be able to address your new problem:

data have;

input Var_a Var_b Var_b_2;

cards;

0          1     1

0          1     2

0          2     1

0          3     1

0          3     5

0          3     6

1          2     1

1          3     1

2          3     1

3          3     2

3          3     5

4          1     1

4          1     2

4          2     1

4          3     5

;

data want;

array t(3) _temporary_;

set have;

by var_a var_b;

do _n_=1 to dim(t);

t(_n_)+(var_b=_n_)*first.var_b;

end;

var_c=t(var_b);

run;

proc print;run;

Haikuo

Frequent Contributor
Posts: 114

## Re: first.statements with multiple variables

Haikuo,

Thank you for your help.  When I apply your code to my dataset, I receive the error message:   Array subscript out of range at line X column X.

Can you advise as to why? Is it related to the "array (t)3 _temporary" statement?

Thanks

Posts: 3,167

## Re: first.statements with multiple variables

Yes, that is the caveat of my code, which is not as robust as others' if not modified. There are two ways around it:

1. Artificially set a bigger range for array, say 100, and hoping the largest var_b is less than 100:

array t(100) _temporary_;

2. Run an additional pass to identify the largest possible var_b first;

proc sql;

select max(var_b) from have;

quit;

and then use that number as array range.

HTH,

Haikuo

Posts: 1,318

## Re: first.statements with multiple variables

Haikuo already stated this but I will just reiterate.  In his method the array must be declared to meet the specific case of the data (or larger), it also restricts the values, as written to only being integers>0.

Frequent Contributor
Posts: 114

## Re: first.statements with multiple variables

My largest var_b value is quite large - close to 600,000.  I believe the array code will still work however, it will just take lots of time to process.

Thanks again.

🔒 This topic is solved and locked.