Solved: Creating new variable, depending on other variables in different rows

Tamino · Posted 09-12-2024 08:59 AM

Hi

I have following dataset:

data dataset_have;
  infile datalines dlm='|' ;
  input id :$20. year count;
datalines;

x1 | 2020 | 1 
x2 | 2017 | 1 
x3 | 2019 | 1 
x4 | 2017 | 1 
x5 | 2015 | 1 
x6 | 2013 | 1 
x6 | 2014 | 2 
x6 | 2015 | 3 
x6 | 2018 | 4 
x7 | 2017 | 1 
x8 | 2020 | 1 
x9 | 2021 | 1 
x9 | 2021 | 2 
x10 | 2022 | 1 
x10 | 2022 | 2 
x11 | 2012 | 1 
x12 | 2022 | 1 
x13 | 2022 | 1 
x14 | 2021 | 1 
x15 | 2012 | 1 
x16 | 2020 | 1 
x16 | 2022 | 2 
x17 | 2019 | 1 
x18 | 2017 | 1 
x18 | 2017 | 2 
x19 | 2016 | 1 
x20 | 2014 | 1 
x21 | 2018 | 1 
x22 | 2017 | 1 
x22 | 2021 | 2 
x23 | 2020 | 1 
x24 | 2018 | 1 
x24 | 2021 | 2 
x25 | 2016 | 1 
x26 | 2022 | 1 
x26 | 2022 | 2 
x27 | 2013 | 1 
x28 | 2016 | 1 
x28 | 2016 | 2 
x29 | 2017 | 1 
x29 | 2021 | 2 
x30 | 2018 | 1 
x31 | 2012 | 1 
x31 | 2012 | 2 
x32 | 2018 | 1 
x33 | 2014 | 1 
x34 | 2021 | 1 
x35 | 2014 | 1 
x35 | 2014 | 2 
x36 | 2015 | 1

I would like a new variable N_COUNT, wich indicates the highest number of the variable COUNT within each ID. This variable (N_COUNT) should be expanded on each row of the corresponding ID, resulting as following dataset:

id | year | count | n_count
x1 | 2020 | 1     | 1
x2 | 2017 | 1     | 1
x3 | 2019 | 1     | 1
x4 | 2017 | 1     | 1
x5 | 2015 | 1     | 1
x6 | 2013 | 1     | 4
x6 | 2014 | 2     | 4
x6 | 2015 | 3     | 4
x6 | 2018 | 4     | 4
x7 | 2017 | 1     | 1
x8 | 2020 | 1     | 1
x9 | 2021 | 1     | 2
x9 | 2021 | 2     | 2
x10 | 2022 | 1    | 2
x10 | 2022 | 2    | 2
x11 | 2012 | 1    | 1
x12 | 2022 | 1    | 1
x13 | 2022 | 1    | 1
x14 | 2021 | 1    | 1
x15 | 2012 | 1    | 1
x16 | 2020 | 1    | 2
x16 | 2022 | 2    | 2
x17 | 2019 | 1    | 1
x18 | 2017 | 1    | 2
x18 | 2017 | 2    | 2
x19 | 2016 | 1    | 1
x20 | 2014 | 1    | 1
x21 | 2018 | 1    | 1
x22 | 2017 | 1    | 2
x22 | 2021 | 2    | 2
x23 | 2020 | 1    | 1
x24 | 2018 | 1    | 2
x24 | 2021 | 2    | 2
x25 | 2016 | 1    | 1
x26 | 2022 | 1    | 2
x26 | 2022 | 2    | 2
x27 | 2013 | 1    | 1
x28 | 2016 | 1    | 2
x28 | 2016 | 2    | 2
x29 | 2017 | 1    | 2
x29 | 2021 | 2    | 2
x30 | 2018 | 1    | 1
x31 | 2012 | 1    | 2
x31 | 2012 | 2    | 2
x32 | 2018 | 1    | 1
x33 | 2014 | 1    | 1
x34 | 2021 | 1    | 1
x35 | 2014 | 1    | 2
x35 | 2014 | 2    | 2
x36 | 2015 | 1    | 1

What command can I use?

Thank you!

A_Kh · Posted 09-12-2024 12:11 PM

SQL, Group BY clause.

proc sql;
	create table want as
		select*, max(count) as n_count
			from dataset_have
		group by id
	order by 1, 2;
quit;

View solution in original post

Quentin · Posted 09-12-2024 09:19 AM

It will help people help you if you can share the code you've tried, and describe how it went wrong. There are different ways to approach this. Are you looking for a DATA step approach? Are you familiar with PROC SQL?

Tamino · Posted 09-12-2024 09:24 AM

I have not tried any code so far. DATA step would be finde to keep the whole dataset, I would also be fine with a PROC SQL, with which I am only little familiar.

ballardw · Posted 09-12-2024 11:27 AM

One way:

proc sql;
   create table want as
 select a.id,a.year,a.count
       ,b.n_count
   from dataset_have as a
        left join
       (select id,  max(count) as n_count 
        from dataset_have
        group by id
       ) as b
        on a.id=b.id
   ;
quit;

In the future please make sure that your data step is complete. Your datalines did not end with a ; so does not run. It is also a very good idea not to have blank lines in datalines as results are sometimes not quite as expected.

The ID values will be default sort order for character values, i.e. x1 followed by x10.

A_Kh · Posted 09-12-2024 12:11 PM

SQL, Group BY clause.

proc sql;
	create table want as
		select*, max(count) as n_count
			from dataset_have
		group by id
	order by 1, 2;
quit;

FreelanceReinh · Posted 09-12-2024 12:33 PM

Hi @Tamino,

If you want to preserve the order of observations, you can use either of the suggested PROC SQL steps with an ORDER BY clause like this

order by input(compress(id,'x'),32.),year,count;

or use a DATA step:

data want;
do _n_=1 by 1 until(last.id);
  set dataset_have;
  by id notsorted;
  n_count=max(n_count,count);
end;
do _n_=1 to _n_;
  set dataset_have;
  output;
end;
run;

Note that your DATASET_HAVE is not sorted (but only grouped) by ID, which is why I used the NOTSORTED option in the BY statement above.

Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Re: Creating new variable, depending on other variables in different rows

Registration is open

SAS Training: Just a Click Away