BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tamino
Obsidian | Level 7

Hi

 

I have following dataset:

 

data dataset_have;
  infile datalines dlm='|' ;
  input id :$20. year count;
datalines;

x1 | 2020 | 1 
x2 | 2017 | 1 
x3 | 2019 | 1 
x4 | 2017 | 1 
x5 | 2015 | 1 
x6 | 2013 | 1 
x6 | 2014 | 2 
x6 | 2015 | 3 
x6 | 2018 | 4 
x7 | 2017 | 1 
x8 | 2020 | 1 
x9 | 2021 | 1 
x9 | 2021 | 2 
x10 | 2022 | 1 
x10 | 2022 | 2 
x11 | 2012 | 1 
x12 | 2022 | 1 
x13 | 2022 | 1 
x14 | 2021 | 1 
x15 | 2012 | 1 
x16 | 2020 | 1 
x16 | 2022 | 2 
x17 | 2019 | 1 
x18 | 2017 | 1 
x18 | 2017 | 2 
x19 | 2016 | 1 
x20 | 2014 | 1 
x21 | 2018 | 1 
x22 | 2017 | 1 
x22 | 2021 | 2 
x23 | 2020 | 1 
x24 | 2018 | 1 
x24 | 2021 | 2 
x25 | 2016 | 1 
x26 | 2022 | 1 
x26 | 2022 | 2 
x27 | 2013 | 1 
x28 | 2016 | 1 
x28 | 2016 | 2 
x29 | 2017 | 1 
x29 | 2021 | 2 
x30 | 2018 | 1 
x31 | 2012 | 1 
x31 | 2012 | 2 
x32 | 2018 | 1 
x33 | 2014 | 1 
x34 | 2021 | 1 
x35 | 2014 | 1 
x35 | 2014 | 2 
x36 | 2015 | 1 

I would like a new variable N_COUNT, wich indicates the highest number of the variable COUNT within each ID. This variable (N_COUNT) should be expanded on each row of the corresponding ID, resulting as following dataset:

id | year | count | n_count
x1 | 2020 | 1     | 1
x2 | 2017 | 1     | 1
x3 | 2019 | 1     | 1
x4 | 2017 | 1     | 1
x5 | 2015 | 1     | 1
x6 | 2013 | 1     | 4
x6 | 2014 | 2     | 4
x6 | 2015 | 3     | 4
x6 | 2018 | 4     | 4
x7 | 2017 | 1     | 1
x8 | 2020 | 1     | 1
x9 | 2021 | 1     | 2
x9 | 2021 | 2     | 2
x10 | 2022 | 1    | 2
x10 | 2022 | 2    | 2
x11 | 2012 | 1    | 1
x12 | 2022 | 1    | 1
x13 | 2022 | 1    | 1
x14 | 2021 | 1    | 1
x15 | 2012 | 1    | 1
x16 | 2020 | 1    | 2
x16 | 2022 | 2    | 2
x17 | 2019 | 1    | 1
x18 | 2017 | 1    | 2
x18 | 2017 | 2    | 2
x19 | 2016 | 1    | 1
x20 | 2014 | 1    | 1
x21 | 2018 | 1    | 1
x22 | 2017 | 1    | 2
x22 | 2021 | 2    | 2
x23 | 2020 | 1    | 1
x24 | 2018 | 1    | 2
x24 | 2021 | 2    | 2
x25 | 2016 | 1    | 1
x26 | 2022 | 1    | 2
x26 | 2022 | 2    | 2
x27 | 2013 | 1    | 1
x28 | 2016 | 1    | 2
x28 | 2016 | 2    | 2
x29 | 2017 | 1    | 2
x29 | 2021 | 2    | 2
x30 | 2018 | 1    | 1
x31 | 2012 | 1    | 2
x31 | 2012 | 2    | 2
x32 | 2018 | 1    | 1
x33 | 2014 | 1    | 1
x34 | 2021 | 1    | 1
x35 | 2014 | 1    | 2
x35 | 2014 | 2    | 2
x36 | 2015 | 1    | 1

What command can I use?

 

Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
A_Kh
Barite | Level 11

SQL, Group BY clause.

proc sql;
	create table want as
		select*, max(count) as n_count
			from dataset_have
		group by id
	order by 1, 2;
quit; 

View solution in original post

5 REPLIES 5
Quentin
Super User

It will help people help you if you can share the code you've tried, and describe how it went wrong.  There are different ways to approach this.  Are you looking for a DATA step approach?  Are you familiar with PROC SQL? 

Tamino
Obsidian | Level 7

I have not tried any code so far. DATA step would be finde to keep the whole dataset, I would also be fine with a PROC SQL, with which I am only little familiar.

ballardw
Super User

One way:

 

proc sql;
   create table want as
 select a.id,a.year,a.count
,b.n_count from dataset_have as a left join (select id, max(count) as n_count from dataset_have group by id ) as b on a.id=b.id ; quit;

In the future please make sure that your data step is complete. Your datalines did not end with a ; so does not run. It is also a very good idea not to have blank lines in datalines as results are sometimes not quite as expected.

 

The ID values will be default sort order for character values, i.e. x1 followed by x10.

A_Kh
Barite | Level 11

SQL, Group BY clause.

proc sql;
	create table want as
		select*, max(count) as n_count
			from dataset_have
		group by id
	order by 1, 2;
quit; 
FreelanceReinh
Jade | Level 19

Hi @Tamino,

 

If you want to preserve the order of observations, you can use either of the suggested PROC SQL steps with an ORDER BY clause like this

order by input(compress(id,'x'),32.),year,count;

or use a DATA step:

data want;
do _n_=1 by 1 until(last.id);
  set dataset_have;
  by id notsorted;
  n_count=max(n_count,count);
end;
do _n_=1 to _n_;
  set dataset_have;
  output;
end;
run;

Note that your DATASET_HAVE is not sorted (but only grouped) by ID, which is why I used the NOTSORTED option in the BY statement above.

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1118 views
  • 2 likes
  • 5 in conversation