BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tamino
Obsidian | Level 7

Hi

 

I have following dataset:

 

data dataset_have;
  infile datalines dlm='|' ;
  input id :$20. year count;
datalines;

x1 | 2020 | 1 
x2 | 2017 | 1 
x3 | 2019 | 1 
x4 | 2017 | 1 
x5 | 2015 | 1 
x6 | 2013 | 1 
x6 | 2014 | 2 
x6 | 2015 | 3 
x6 | 2018 | 4 
x7 | 2017 | 1 
x8 | 2020 | 1 
x9 | 2021 | 1 
x9 | 2021 | 2 
x10 | 2022 | 1 
x10 | 2022 | 2 
x11 | 2012 | 1 
x12 | 2022 | 1 
x13 | 2022 | 1 
x14 | 2021 | 1 
x15 | 2012 | 1 
x16 | 2020 | 1 
x16 | 2022 | 2 
x17 | 2019 | 1 
x18 | 2017 | 1 
x18 | 2017 | 2 
x19 | 2016 | 1 
x20 | 2014 | 1 
x21 | 2018 | 1 
x22 | 2017 | 1 
x22 | 2021 | 2 
x23 | 2020 | 1 
x24 | 2018 | 1 
x24 | 2021 | 2 
x25 | 2016 | 1 
x26 | 2022 | 1 
x26 | 2022 | 2 
x27 | 2013 | 1 
x28 | 2016 | 1 
x28 | 2016 | 2 
x29 | 2017 | 1 
x29 | 2021 | 2 
x30 | 2018 | 1 
x31 | 2012 | 1 
x31 | 2012 | 2 
x32 | 2018 | 1 
x33 | 2014 | 1 
x34 | 2021 | 1 
x35 | 2014 | 1 
x35 | 2014 | 2 
x36 | 2015 | 1 

I would like a new variable N_COUNT, wich indicates the highest number of the variable COUNT within each ID. This variable (N_COUNT) should be expanded on each row of the corresponding ID, resulting as following dataset:

id | year | count | n_count
x1 | 2020 | 1     | 1
x2 | 2017 | 1     | 1
x3 | 2019 | 1     | 1
x4 | 2017 | 1     | 1
x5 | 2015 | 1     | 1
x6 | 2013 | 1     | 4
x6 | 2014 | 2     | 4
x6 | 2015 | 3     | 4
x6 | 2018 | 4     | 4
x7 | 2017 | 1     | 1
x8 | 2020 | 1     | 1
x9 | 2021 | 1     | 2
x9 | 2021 | 2     | 2
x10 | 2022 | 1    | 2
x10 | 2022 | 2    | 2
x11 | 2012 | 1    | 1
x12 | 2022 | 1    | 1
x13 | 2022 | 1    | 1
x14 | 2021 | 1    | 1
x15 | 2012 | 1    | 1
x16 | 2020 | 1    | 2
x16 | 2022 | 2    | 2
x17 | 2019 | 1    | 1
x18 | 2017 | 1    | 2
x18 | 2017 | 2    | 2
x19 | 2016 | 1    | 1
x20 | 2014 | 1    | 1
x21 | 2018 | 1    | 1
x22 | 2017 | 1    | 2
x22 | 2021 | 2    | 2
x23 | 2020 | 1    | 1
x24 | 2018 | 1    | 2
x24 | 2021 | 2    | 2
x25 | 2016 | 1    | 1
x26 | 2022 | 1    | 2
x26 | 2022 | 2    | 2
x27 | 2013 | 1    | 1
x28 | 2016 | 1    | 2
x28 | 2016 | 2    | 2
x29 | 2017 | 1    | 2
x29 | 2021 | 2    | 2
x30 | 2018 | 1    | 1
x31 | 2012 | 1    | 2
x31 | 2012 | 2    | 2
x32 | 2018 | 1    | 1
x33 | 2014 | 1    | 1
x34 | 2021 | 1    | 1
x35 | 2014 | 1    | 2
x35 | 2014 | 2    | 2
x36 | 2015 | 1    | 1

What command can I use?

 

Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
A_Kh
Barite | Level 11

SQL, Group BY clause.

proc sql;
	create table want as
		select*, max(count) as n_count
			from dataset_have
		group by id
	order by 1, 2;
quit; 

View solution in original post

5 REPLIES 5
Quentin
Super User

It will help people help you if you can share the code you've tried, and describe how it went wrong.  There are different ways to approach this.  Are you looking for a DATA step approach?  Are you familiar with PROC SQL? 

The Boston Area SAS Users Group (BASUG) is hosting an in person Meeting & Training on June 27!
Full details and registration info at https://www.basug.org/events.
Tamino
Obsidian | Level 7

I have not tried any code so far. DATA step would be finde to keep the whole dataset, I would also be fine with a PROC SQL, with which I am only little familiar.

ballardw
Super User

One way:

 

proc sql;
   create table want as
 select a.id,a.year,a.count
,b.n_count from dataset_have as a left join (select id, max(count) as n_count from dataset_have group by id ) as b on a.id=b.id ; quit;

In the future please make sure that your data step is complete. Your datalines did not end with a ; so does not run. It is also a very good idea not to have blank lines in datalines as results are sometimes not quite as expected.

 

The ID values will be default sort order for character values, i.e. x1 followed by x10.

A_Kh
Barite | Level 11

SQL, Group BY clause.

proc sql;
	create table want as
		select*, max(count) as n_count
			from dataset_have
		group by id
	order by 1, 2;
quit; 
FreelanceReinh
Jade | Level 19

Hi @Tamino,

 

If you want to preserve the order of observations, you can use either of the suggested PROC SQL steps with an ORDER BY clause like this

order by input(compress(id,'x'),32.),year,count;

or use a DATA step:

data want;
do _n_=1 by 1 until(last.id);
  set dataset_have;
  by id notsorted;
  n_count=max(n_count,count);
end;
do _n_=1 to _n_;
  set dataset_have;
  output;
end;
run;

Note that your DATASET_HAVE is not sorted (but only grouped) by ID, which is why I used the NOTSORTED option in the BY statement above.

 

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 922 views
  • 2 likes
  • 5 in conversation