BookmarkSubscribeRSS Feed
ybz12003
Rhodochrosite | Level 12

Hi experts:

 

I have the test dataset list below.   I would like to do three things:

 

1.  Assign the count number like the "Count" column below.

2. Create a column 'i' the same as the column 'Count'.

3. Separate those two datasets if the ID and season are duplicates.  For example, dataset one is including IDs '100', 200', '305', and '410'.     The other dataset is the rest IDs.

 

Thanks!

 

data test;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
datalines;
100, 9,  1,  ,
100, 9,  2,  ,
100, 10,  1,  ,
112, 14,  1,  ,
150, 14,  1,  ,
180, 14,  1,  ,
181, 15,  1,  ,
200, 15,  1,  ,
200, 15,  2,   ,
200, 15,  3,   ,
200, 17,  1,   ,
210, 8,  1,   ,
270, 10,  1,   ,
280, 11,  1,   ,
305, 11,   1,  ,
305, 11,   2,  ,
322, 16,   1,  ,
377, 16,   1,  ,
410, 8,   1,  ,
410, 8,   2,  ,
421, 10,   1,  ,
422, 10,   1,  ,
;

 

And the datasets I want are  

 

data One;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
datalines;
100, 10,  1,  1,
112, 14,  1,  1,
150, 14,  1,  1,
180, 14,  1,  1,
181, 15,  1,  1,
200, 17,  1,   1,
210, 8,  1,   1,
270, 10,  1,   1,
280, 11,  1,   1,
322, 16,   1,  1,
377, 16,   1,  1,
421, 10,   1,  1,
422, 10,   1,  1,
;

data two;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
datalines;
100, 9,  1,  1,
100, 9,  2,  2,
200, 15,  1,  1,
200, 15,  2,   2,
200, 15,  3,   3,
305, 11,   1,  1,
305, 11,   2,  2,
410, 8,   1,  1,
410, 8,   2,  2,
;

.

 

17 REPLIES 17
novinosrin
Tourmaline | Level 20

data test;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
  drop count;
datalines;
100, 9,  1,  ,
100, 9,  2,  ,
100, 10,  1,  ,
112, 14,  1,  ,
150, 14,  1,  ,
180, 14,  1,  ,
181, 15,  1,  ,
200, 15,  1,  ,
200, 15,  2,   ,
200, 15,  3,   ,
200, 17,  1,   ,
210, 8,  1,   ,
270, 10,  1,   ,
280, 11,  1,   ,
305, 11,   1,  ,
305, 11,   2,  ,
322, 16,   1,  ,
377, 16,   1,  ,
410, 8,   1,  ,
410, 8,   2,  ,
421, 10,   1,  ,
422, 10,   1,  ,
;

data one two;
do until(last.id);
do count=1 by 1 until(last.season);
set test;
by id season;
end;
if  first.id and last.id then output two;
else output one;
end;
run;
novinosrin
Tourmaline | Level 20

or perhaps this--?

 

data one two;
do count=1 by 1 until(last.season);
set test;
by id season;
if  first.id and last.id then output two;
else output one;
end;
run;

 

Reeza
Super User

For #1 try this:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 

For #2, PROC SORT has a newer option, NOUNIQUEKEY that makes this trivial.

But ID 100 has two entries for seasons, for season 9 there are duplicates, for season 10 there are no duplicates. So you want all ID 100 records to go into a different data set or just the duplicates?

 


@ybz12003 wrote:

Hi experts:

 

I have the test dataset list below.   I would like to do two things:

 

1.  Assign the count number like the "Count" column below

2. Separate those two datasets if the ID and season are duplicate.  For example, dataset one is including IDs '100', 200', '305', and '410'.     The other dataset is the rest IDs.

 

Thanks!

 

data test;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
datalines;
100, 9,  1,  ,
100, 9,  2,  ,
100, 10,  1,  ,
112, 14,  1,  ,
150, 14,  1,  ,
180, 14,  1,  ,
181, 15,  1,  ,
200, 15,  1,  ,
200, 15,  2,   ,
200, 15,  3,   ,
200, 17,  1,   ,
210, 8,  1,   ,
270, 10,  1,   ,
280, 11,  1,   ,
305, 11,   1,  ,
305, 11,   2,  ,
322, 16,   1,  ,
377, 16,   1,  ,
410, 8,   1,  ,
410, 8,   2,  ,
421, 10,   1,  ,
422, 10,   1,  ,
;

 

 

PeterClemmensen
Tourmaline | Level 20
data test;
      infile datalines dsd;
  input ID : $10. Season;
datalines;
100, 9
100, 9
100, 10
112, 14
150, 14
180, 14
181, 15
200, 15
200, 15
200, 15
200, 17
210, 8
270, 10
280, 11
305, 11
305, 11
322, 16
377, 16
410, 8
410, 8
421, 10
422, 10
;

data temp;
   set test;
   by ID Season;
   count+1;
   if first.Season then count=1;
run;

data singles duplicates;
   
   if _N_ = 1 then do;
   declare hash h(dataset:'temp(where=(count>1))');
      h.defineKey('ID');
      h.defineDone();
   end;

   set temp;

   if h.check()=0 then output duplicates;
   else output singles;
run;
novinosrin
Tourmaline | Level 20

Personally, I feel the requirement is much easier than having to use hash. However, I appreciate your enthusiam in offering various solutions. That's so cool. Love that attitude!

 

Even with hash, you could consider just one step

 


data test;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
  drop count;
datalines;
100, 9,  1,  ,
100, 9,  2,  ,
100, 10,  1,  ,
112, 14,  1,  ,
150, 14,  1,  ,
180, 14,  1,  ,
181, 15,  1,  ,
200, 15,  1,  ,
200, 15,  2,   ,
200, 15,  3,   ,
200, 17,  1,   ,
210, 8,  1,   ,
270, 10,  1,   ,
280, 11,  1,   ,
305, 11,   1,  ,
305, 11,   2,  ,
322, 16,   1,  ,
377, 16,   1,  ,
410, 8,   1,  ,
410, 8,   2,  ,
421, 10,   1,  ,
422, 10,   1,  ,
;

data one two;
if _n_=1 then do;
   dcl hash H (dataset:'test',multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ("id") ;
   h.definedone () ;
 end;
 set test;
 by id season;
if  first.season then count=1;
else count+1;
if first.id then do;rc=h.check(); h.has_next(result:_iorc_);end;
if _iorc_ then output one;else output two;
run;
novinosrin
Tourmaline | Level 20

If the DOW loop was little confusing, the requirement is as simple as this

 

Just a matter of diligent use of automativ by group variables first and last



data test;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
  drop count;
datalines;
100, 9,  1,  ,
100, 9,  2,  ,
100, 10,  1,  ,
112, 14,  1,  ,
150, 14,  1,  ,
180, 14,  1,  ,
181, 15,  1,  ,
200, 15,  1,  ,
200, 15,  2,   ,
200, 15,  3,   ,
200, 17,  1,   ,
210, 8,  1,   ,
270, 10,  1,   ,
280, 11,  1,   ,
305, 11,   1,  ,
305, 11,   2,  ,
322, 16,   1,  ,
377, 16,   1,  ,
410, 8,   1,  ,
410, 8,   2,  ,
421, 10,   1,  ,
422, 10,   1,  ,
;


data one two;
set test;
by id season;
if  first.season then count=1;
else count+1;
if first.id and last.id then output two;
else output one;
run;
ybz12003
Rhodochrosite | Level 12

Hi all:

 

Unfortunately, the result I am looking for is not what you created.   Please see my edited first message.   Sorry for the confusion. 

Reeza
Super User
proc sort data=have uniqueout=unique nouniquekey out=duplicates;
by ID season;
run;

@ybz12003 wrote:

Hi all:

 

Unfortunately, the result I am looking for is not what you created.   Please see my edited first message.   Sorry for the confusion.

 


Please don't edit your posts, it makes it harder to follow later on. Use a new response. 

ybz12003
Rhodochrosite | Level 12

I have used the same code, it did not work.   Please see the result I want. 

novinosrin
Tourmaline | Level 20

OK here you go

 

data one two;
set test;
by id season;
if  first.season then count=1;
else count+1;
if first.season and last.season then output one;
else output two;
run;
novinosrin
Tourmaline | Level 20

@ybz12003 The change is very minor 

 

we use first.season and last.season, instead of first.id and last.id pairs

ybz12003
Rhodochrosite | Level 12

Hi:

 

The code you created was good.  However, I also need the column 'i' is exactly the same as column 'count' because I need to use it for further transpose.  The column 'count' you have is not the one I want.

data one two;
set test;
by id season;
if  first.season then count=1;
else count+1;
if first.season and last.season then output one;
else output two;
run;

 

Reeza
Super User

@ybz12003 wrote:

Hi:

 

The code you created was good.  However, I also need the column 'i' is exactly the same as column 'count' because I need to use it for further transpose.  The column 'count' you have is not the one I want.

data one two;
set test;
by id season;
if  first.season then count=1;
else count+1;
if first.season and last.season then output one;
else output two;
run;

 


I is blank in your sample data and if you want it exactly as count, why not just assign it? If count already exists on the data set the RETAIN won't work correctly. You do not need to add a counter to this automatically when transposing, SAS does that automatically. 

Reeza
Super User

I tested it, it works exactly as requested given your sample data. The rest is up to you now.


@ybz12003 wrote:

I have used the same code, it did not work.   Please see the result I want. 


 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 3100 views
  • 1 like
  • 5 in conversation