Re: Count and Seperate dataset?

ybz12003 · Posted 01-14-2019 03:06 PM

Hi experts:

I have the test dataset list below. I would like to do three things:

1. Assign the count number like the "Count" column below.

2. Create a column 'i' the same as the column 'Count'.

3. Separate those two datasets if the ID and season are duplicates. For example, dataset one is including IDs '100', 200', '305', and '410'. The other dataset is the rest IDs.

Thanks!

data test;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
datalines;
100, 9,  1,  ,
100, 9,  2,  ,
100, 10,  1,  ,
112, 14,  1,  ,
150, 14,  1,  ,
180, 14,  1,  ,
181, 15,  1,  ,
200, 15,  1,  ,
200, 15,  2,   ,
200, 15,  3,   ,
200, 17,  1,   ,
210, 8,  1,   ,
270, 10,  1,   ,
280, 11,  1,   ,
305, 11,   1,  ,
305, 11,   2,  ,
322, 16,   1,  ,
377, 16,   1,  ,
410, 8,   1,  ,
410, 8,   2,  ,
421, 10,   1,  ,
422, 10,   1,  ,
;

And the datasets I want are

data One;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
datalines;
100, 10,  1,  1,
112, 14,  1,  1,
150, 14,  1,  1,
180, 14,  1,  1,
181, 15,  1,  1,
200, 17,  1,   1,
210, 8,  1,   1,
270, 10,  1,   1,
280, 11,  1,   1,
322, 16,   1,  1,
377, 16,   1,  1,
421, 10,   1,  1,
422, 10,   1,  1,
;

data two;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
datalines;
100, 9,  1,  1,
100, 9,  2,  2,
200, 15,  1,  1,
200, 15,  2,   2,
200, 15,  3,   3,
305, 11,   1,  1,
305, 11,   2,  2,
410, 8,   1,  1,
410, 8,   2,  2,
;

.

novinosrin · Posted 01-14-2019 03:14 PM


data test;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
  drop count;
datalines;
100, 9,  1,  ,
100, 9,  2,  ,
100, 10,  1,  ,
112, 14,  1,  ,
150, 14,  1,  ,
180, 14,  1,  ,
181, 15,  1,  ,
200, 15,  1,  ,
200, 15,  2,   ,
200, 15,  3,   ,
200, 17,  1,   ,
210, 8,  1,   ,
270, 10,  1,   ,
280, 11,  1,   ,
305, 11,   1,  ,
305, 11,   2,  ,
322, 16,   1,  ,
377, 16,   1,  ,
410, 8,   1,  ,
410, 8,   2,  ,
421, 10,   1,  ,
422, 10,   1,  ,
;

data one two;
do until(last.id);
do count=1 by 1 until(last.season);
set test;
by id season;
end;
if  first.id and last.id then output two;
else output one;
end;
run;

novinosrin · Posted 01-14-2019 03:17 PM

or perhaps this--?

data one two;
do count=1 by 1 until(last.season);
set test;
by id season;
if  first.id and last.id then output two;
else output one;
end;
run;

Reeza · Posted 01-14-2019 03:14 PM

For #1 try this:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

For #2, PROC SORT has a newer option, NOUNIQUEKEY that makes this trivial.

But ID 100 has two entries for seasons, for season 9 there are duplicates, for season 10 there are no duplicates. So you want all ID 100 records to go into a different data set or just the duplicates?

@ybz12003 wrote:

Hi experts:

I have the test dataset list below. I would like to do two things:

1. Assign the count number like the "Count" column below

2. Separate those two datasets if the ID and season are duplicate. For example, dataset one is including IDs '100', 200', '305', and '410'. The other dataset is the rest IDs.

Thanks!
data test;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
datalines;
100, 9,  1,  ,
100, 9,  2,  ,
100, 10,  1,  ,
112, 14,  1,  ,
150, 14,  1,  ,
180, 14,  1,  ,
181, 15,  1,  ,
200, 15,  1,  ,
200, 15,  2,   ,
200, 15,  3,   ,
200, 17,  1,   ,
210, 8,  1,   ,
270, 10,  1,   ,
280, 11,  1,   ,
305, 11,   1,  ,
305, 11,   2,  ,
322, 16,   1,  ,
377, 16,   1,  ,
410, 8,   1,  ,
410, 8,   2,  ,
421, 10,   1,  ,
422, 10,   1,  ,
;

PeterClemmensen · Posted 01-14-2019 03:17 PM

data test;
      infile datalines dsd;
  input ID : $10. Season;
datalines;
100, 9
100, 9
100, 10
112, 14
150, 14
180, 14
181, 15
200, 15
200, 15
200, 15
200, 17
210, 8
270, 10
280, 11
305, 11
305, 11
322, 16
377, 16
410, 8
410, 8
421, 10
422, 10
;

data temp;
   set test;
   by ID Season;
   count+1;
   if first.Season then count=1;
run;

data singles duplicates;
   
   if _N_ = 1 then do;
   declare hash h(dataset:'temp(where=(count>1))');
      h.defineKey('ID');
      h.defineDone();
   end;

   set temp;

   if h.check()=0 then output duplicates;
   else output singles;
run;

The DATA to DATA Step Macro
Blog: SASnrd

novinosrin · Posted 01-14-2019 04:50 PM

Personally, I feel the requirement is much easier than having to use hash. However, I appreciate your enthusiam in offering various solutions. That's so cool. Love that attitude!

Even with hash, you could consider just one step


data test;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
  drop count;
datalines;
100, 9,  1,  ,
100, 9,  2,  ,
100, 10,  1,  ,
112, 14,  1,  ,
150, 14,  1,  ,
180, 14,  1,  ,
181, 15,  1,  ,
200, 15,  1,  ,
200, 15,  2,   ,
200, 15,  3,   ,
200, 17,  1,   ,
210, 8,  1,   ,
270, 10,  1,   ,
280, 11,  1,   ,
305, 11,   1,  ,
305, 11,   2,  ,
322, 16,   1,  ,
377, 16,   1,  ,
410, 8,   1,  ,
410, 8,   2,  ,
421, 10,   1,  ,
422, 10,   1,  ,
;

data one two;
if _n_=1 then do;
   dcl hash H (dataset:'test',multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ("id") ;
   h.definedone () ;
 end;
 set test;
 by id season;
if  first.season then count=1;
else count+1;
if first.id then do;rc=h.check(); h.has_next(result:_iorc_);end;
if _iorc_ then output one;else output two;
run;

novinosrin · Posted 01-14-2019 04:33 PM

If the DOW loop was little confusing, the requirement is as simple as this

Just a matter of diligent use of automativ by group variables first and last



data test;
      infile datalines dsd;
  input ID : $10. Season  Count   i ;
  drop count;
datalines;
100, 9,  1,  ,
100, 9,  2,  ,
100, 10,  1,  ,
112, 14,  1,  ,
150, 14,  1,  ,
180, 14,  1,  ,
181, 15,  1,  ,
200, 15,  1,  ,
200, 15,  2,   ,
200, 15,  3,   ,
200, 17,  1,   ,
210, 8,  1,   ,
270, 10,  1,   ,
280, 11,  1,   ,
305, 11,   1,  ,
305, 11,   2,  ,
322, 16,   1,  ,
377, 16,   1,  ,
410, 8,   1,  ,
410, 8,   2,  ,
421, 10,   1,  ,
422, 10,   1,  ,
;


data one two;
set test;
by id season;
if  first.season then count=1;
else count+1;
if first.id and last.id then output two;
else output one;
run;

ybz12003 · Posted 01-14-2019 05:11 PM

Hi all:

Unfortunately, the result I am looking for is not what you created. Please see my edited first message. Sorry for the confusion.

Reeza · Posted 01-14-2019 05:23 PM

proc sort data=have uniqueout=unique nouniquekey out=duplicates;
by ID season;
run;

@ybz12003 wrote:

Hi all:

Unfortunately, the result I am looking for is not what you created. Please see my edited first message. Sorry for the confusion.

Please don't edit your posts, it makes it harder to follow later on. Use a new response.

ybz12003 · Posted 01-14-2019 05:27 PM

I have used the same code, it did not work. Please see the result I want.

novinosrin · Posted 01-14-2019 05:29 PM

OK here you go

data one two;
set test;
by id season;
if  first.season then count=1;
else count+1;
if first.season and last.season then output one;
else output two;
run;

novinosrin · Posted 01-14-2019 05:31 PM

@ybz12003 The change is very minor

we use first.season and last.season, instead of first.id and last.id pairs

ybz12003 · Posted 01-14-2019 08:04 PM

Hi:

The code you created was good. However, I also need the column 'i' is exactly the same as column 'count' because I need to use it for further transpose. The column 'count' you have is not the one I want.

data one two;
set test;
by id season;
if  first.season then count=1;
else count+1;
if first.season and last.season then output one;
else output two;
run;

Reeza · Posted 01-14-2019 08:37 PM

@ybz12003 wrote:

Hi:

The code you created was good. However, I also need the column 'i' is exactly the same as column 'count' because I need to use it for further transpose. The column 'count' you have is not the one I want.
data one two;
set test;
by id season;
if  first.season then count=1;
else count+1;
if first.season and last.season then output one;
else output two;
run;

I is blank in your sample data and if you want it exactly as count, why not just assign it? If count already exists on the data set the RETAIN won't work correctly. You do not need to add a counter to this automatically when transposing, SAS does that automatically.

Reeza · Posted 01-14-2019 05:41 PM

I tested it, it works exactly as requested given your sample data. The rest is up to you now.

@ybz12003 wrote:

I have used the same code, it did not work. Please see the result I want.

Classroom Training Available!