SAS Community,
Is anyone able to see what's wrong with my code below? Essentially I would like to do the below logic, but I can't seem to figure out how to write this in proper SAS language.
proc sort data=have;
by Var_a Var_b;
data want;
set have;
by Var_a Var_b;
if first.Var_a and first.Var_b then Var_c+1;
run;
Thanks.
data have;
input Var_a Var_b;
0 1
0 2
0 3
1 2
1 3
2 3
3 3
4 1
4 2
4 3
run;
data want;
Var_a Var_b Var_c
0 1 1
0 2 1
0 3 1
1 2 2
1 3 2
2 3 3
3 3 4
4 1 2
4 2 3
4 3 5
It looks to me like you just want to sort by var b and b2 and reset the counter when you see FIRST.B2. Var_A is just an example of an extra variable that you want to carry along. You can extent this logic to as many sort by variables as you want, you still just want to trigger the counter off of the change in the last variable in the list.
%let byvars=var_b var_b2;
proc sort data=have;
by &byvars;
run;
data want;
set have ;
by &byvars;
if first.%scan(&byvars,-1) then var_c=1;
else var_c+1;
run;
Obs var_a var_b var_b2 want_c var_c
1 0 1 1 1 1
2 4 1 1 2 2
3 0 1 2 1 1
4 4 1 2 2 2
5 0 2 1 1 1
6 1 2 1 2 2
7 4 2 1 3 3
8 0 3 1 1 1
9 1 3 1 2 2
10 2 3 1 3 3
11 3 3 2 1 1
12 0 3 5 1 1
13 3 3 5 2 2
14 4 3 5 3 3
15 0 3 6 1 1
data want;
set have;
by var_a;
var_c+first.var_a;
run;
Haikuo
Maybe it is just me, but I do not understand what it is you are really trying to accomplish here. Why do you believe your counter would reset to 2 on observation 8?
The value is 2 because its only the 2nd time where Var_b=1 has been listed
Here's what I think you are after:
proc sort data=have;
by var_b var_a;
run;
data want;
set have;
by var_b;
if first.var_b then var_c=1;
else var_c + 1;
run;
Then re-sort if you would like to.
Good luck.
Okay, that was the piece of the explanation that was missing for me:
data have;
input Var_a Var_b;
cards;
0 1
0 2
0 3
1 2
1 3
2 3
3 3
4 1
4 2
4 3
;
run;
proc sort data=have;
by Var_b;
data want;
if 0 then set have; *to maintain column order;
do Var_c=1 by 1 until(last.Var_b);
set have;
by Var_b;
output;
end;
run;
proc sort data=want;
by Var_a Var_b;
run;
This is very helpful, FriedEgg. There's just one more piece that I didn't share as I didn't think it necessary and I was trying to keep things simple. I see now I was wrong.
So there are different levels with Var_b. See new dataset below. How do I make sure that the Var_C does NOT count the different levels in Var_b_2. (i.e. I only want the last.Var_b in each Var_a group to be counted towards Var_C.
Have I confused things more?
Thank you for your help!
input Var_a Var_b Var_b_2;
cards;
0 1 1
0 1 2
0 2 1
0 3 1
0 3 5
0 3 6
1 2 1
1 3 1
2 3 1
3 3 2
3 3 5
4 1 1
4 1 2
4 2 1
4 3 5
proc sort data=have;
by Var_b Var_b_2;
run;
data want;
if 0 then set have;
do Var_c=1 by 1 until(last.Var_b);
do until(last.Var_b_2);
set have;
by Var_b Var_b_2;
output;
end;
end;
run;
It's close but not exactly right as Var_C is changing based on Var_b_2 and Var_b does not impact Var_C.
So for example using the sample dataset posted previously:
data have;
input Var_a Var_b Var_b_2;
cards;
0 1 1
0 1 2
0 2 1
0 3 1
0 3 5
0 3 6
1 2 1
1 3 1
2 3 1
3 3 2
3 3 5
4 1 1
4 1 2
4 2 1
4 3 5
;
I should have the output below:
input Var_a Var_b Var_b_2 Var_c;
cards;
0 1 1 1
0 1 2 1
0 2 1 1
0 3 1 1
0 3 5 1
0 3 6 1
1 2 1 2
1 3 1 2
2 3 1 3
3 3 2 4
3 3 5 4
4 1 1 2
4 1 2 2
4 2 1 3
4 3 5 6
*sorry I meant to type *var_b_2 does not change
It's close but not exactly right as Var_C is changing based on Var_b_2 and *Var_b does not impact Var_C.
A minor tweak seems to be able to address your new problem:
data have;
input Var_a Var_b Var_b_2;
cards;
0 1 1
0 1 2
0 2 1
0 3 1
0 3 5
0 3 6
1 2 1
1 3 1
2 3 1
3 3 2
3 3 5
4 1 1
4 1 2
4 2 1
4 3 5
;
data want;
array t(3) _temporary_;
set have;
by var_a var_b;
do _n_=1 to dim(t);
t(_n_)+(var_b=_n_)*first.var_b;
end;
var_c=t(var_b);
run;
proc print;run;
Haikuo
Haikuo,
Thank you for your help. When I apply your code to my dataset, I receive the error message: Array subscript out of range at line X column X.
Can you advise as to why? Is it related to the "array (t)3 _temporary" statement?
Thanks
Yes, that is the caveat of my code, which is not as robust as others' if not modified. There are two ways around it:
1. Artificially set a bigger range for array, say 100, and hoping the largest var_b is less than 100:
array t(100) _temporary_;
2. Run an additional pass to identify the largest possible var_b first;
proc sql;
select max(var_b) from have;
quit;
and then use that number as array range.
HTH,
Haikuo
Haikuo already stated this but I will just reiterate. In his method the array must be declared to meet the specific case of the data (or larger), it also restricts the values, as written to only being integers>0.
My largest var_b value is quite large - close to 600,000. I believe the array code will still work however, it will just take lots of time to process.
Thanks again.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.