BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sophia_SAS
Obsidian | Level 7

SAS Community,

Is anyone able to see what's wrong with my code below?  Essentially I would like to do the below logic, but I can't seem to figure out how to write this in proper SAS language.

proc sort data=have;

by Var_a Var_b;

data want;

set have;

by Var_a Var_b;

if first.Var_a and first.Var_b then Var_c+1;

run;

Thanks.

data have;

input Var_a Var_b;

          0          1

          0          2

          0          3

          1          2

          1          3

          2          3

          3          3

          4          1

          4          2

          4          3

run;

data want;

     Var_a    Var_b    Var_c

          0          1          1

          0          2          1

          0          3          1

          1          2          2

          1          3          2

          2          3          3

          3          3          4

          4          1          2

          4          2          3

          4          3          5

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

It looks to me like you just want to sort by var b and b2 and reset the counter when you see FIRST.B2.  Var_A is just an example of an extra variable that you want to carry along.  You can extent this logic to as many sort by variables as you want, you still just want to trigger the counter off of the change in the last variable in the list.

%let byvars=var_b var_b2;

proc sort data=have;

  by &byvars;

run;

data want;

  set have ;

  by &byvars;

  if first.%scan(&byvars,-1) then var_c=1;

  else var_c+1;

run;


Obs    var_a    var_b    var_b2    want_c    var_c

  1      0        1         1         1        1

  2      4        1         1         2        2

  3      0        1         2         1        1

  4      4        1         2         2        2

  5      0        2         1         1        1

  6      1        2         1         2        2

  7      4        2         1         3        3

  8      0        3         1         1        1

  9      1        3         1         2        2

10      2        3         1         3        3

11      3        3         2         1        1

12      0        3         5         1        1

13      3        3         5         2        2

14      4        3         5         3        3

15      0        3         6         1        1

View solution in original post

20 REPLIES 20
Haikuo
Onyx | Level 15

data want;

  set have;

    by var_a;

    var_c+first.var_a;

run;

Haikuo

FriedEgg
SAS Employee

Maybe it is just me, but I do not understand what it is you are really trying to accomplish here.  Why do you believe your counter would reset to 2 on observation 8?

sophia_SAS
Obsidian | Level 7

The value is 2 because its only the 2nd time where Var_b=1 has been listed

Astounding
PROC Star

Here's what I think you are after:

proc sort data=have;

   by var_b var_a;

run;

data want;

   set have;

   by var_b;

   if first.var_b then var_c=1;

   else var_c + 1;

run;

Then re-sort if you would like to.

Good luck.

FriedEgg
SAS Employee

Okay, that was the piece of the explanation that was missing for me:

data have;

input Var_a Var_b;

cards;

          0          1

          0          2

          0          3

          1          2

          1          3

          2          3

          3          3

          4          1

          4          2

          4          3

;

run;

proc sort data=have;

by Var_b;

data want;

if 0 then set have; *to maintain column order;

do Var_c=1 by 1 until(last.Var_b);

set have;

by Var_b;

output;

end;

run;

proc sort data=want;

by Var_a Var_b;

run;

sophia_SAS
Obsidian | Level 7

This is very helpful, FriedEgg.  There's just one more piece that I didn't share as I didn't think it necessary and I was trying to keep things simple.  I see now I was wrong.

So there are different levels with Var_b.  See new dataset below.  How do I make sure that the Var_C does NOT count the different levels in Var_b_2.  (i.e. I only want the last.Var_b in each Var_a group to be counted towards Var_C. 

Have I confused things more?

Thank you for your help!

input Var_a Var_b Var_b_2;

cards;

          0          1     1

          0          1     2

          0          2     1

          0          3     1

          0          3     5

          0          3     6

          1          2     1

          1          3     1

          2          3     1

          3          3     2

          3          3     5

          4          1     1

          4          1     2

          4          2     1

          4          3     5

FriedEgg
SAS Employee

proc sort data=have;

by Var_b Var_b_2;

run;

data want;

if 0 then set have;

do Var_c=1 by 1 until(last.Var_b);

  do until(last.Var_b_2);

   set have;

   by Var_b Var_b_2;

   output;

  end;

end;

run;

sophia_SAS
Obsidian | Level 7

It's close but not exactly right as Var_C is changing based on Var_b_2 and Var_b does not impact Var_C.

So for example using the sample dataset posted previously:

data have;

input Var_a Var_b Var_b_2;

cards;

          0          1     1

          0          1     2

         0          2     1

          0          3     1

          0          3     5

          0          3     6

          1          2     1

          1          3     1

          2          3     1

          3          3     2

          3          3     5

         4          1     1

          4          1     2

          4          2     1

          4          3     5

;

I should have the output below:

input Var_a Var_b Var_b_2 Var_c;

cards;

          0          1     1    1

          0          1     2    1

         0          2     1     1

          0          3     1    1

          0          3     5    1

          0          3     6    1

          1          2     1    2

          1          3     1    2

          2          3     1    3

          3          3     2    4

          3          3     5    4

         4          1     1     2

          4          1     2    2

          4          2     1    3

          4          3     5    6

sophia_SAS
Obsidian | Level 7

*sorry I meant to type  *var_b_2 does not change

It's close but not exactly right as Var_C is changing based on Var_b_2 and *Var_b does not impact Var_C.

Haikuo
Onyx | Level 15

A minor tweak seems to be able to address your new problem:

data have;

input Var_a Var_b Var_b_2;

cards;

          0          1     1

          0          1     2

         0          2     1

          0          3     1

          0          3     5

          0          3     6

          1          2     1

          1          3     1

          2          3     1

          3          3     2

          3          3     5

         4          1     1

          4          1     2

          4          2     1

          4          3     5

;

data want;

array t(3) _temporary_;

  set have;

    by var_a var_b;

    do _n_=1 to dim(t);

    t(_n_)+(var_b=_n_)*first.var_b;

    end;

    var_c=t(var_b);

run;

proc print;run;

Haikuo

sophia_SAS
Obsidian | Level 7

Haikuo,

Thank you for your help.  When I apply your code to my dataset, I receive the error message:   Array subscript out of range at line X column X.

Can you advise as to why? Is it related to the "array (t)3 _temporary" statement?

Thanks

Haikuo
Onyx | Level 15

Yes, that is the caveat of my code, which is not as robust as others' if not modified. There are two ways around it:

1. Artificially set a bigger range for array, say 100, and hoping the largest var_b is less than 100:

array t(100) _temporary_;

2. Run an additional pass to identify the largest possible var_b first;

proc sql;

  select max(var_b) from have;

quit;

and then use that number as array range.

HTH,

Haikuo

FriedEgg
SAS Employee

Haikuo already stated this but I will just reiterate.  In his method the array must be declared to meet the specific case of the data (or larger), it also restricts the values, as written to only being integers>0.

sophia_SAS
Obsidian | Level 7

My largest var_b value is quite large - close to 600,000.  I believe the array code will still work however, it will just take lots of time to process.

Thanks again.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 7627 views
  • 3 likes
  • 6 in conversation