Hi experts:
I have the test dataset list below. I would like to do three things:
1. Assign the count number like the "Count" column below.
2. Create a column 'i' the same as the column 'Count'.
3. Separate those two datasets if the ID and season are duplicates. For example, dataset one is including IDs '100', 200', '305', and '410'. The other dataset is the rest IDs.
Thanks!
data test;
infile datalines dsd;
input ID : $10. Season Count i ;
datalines;
100, 9, 1, ,
100, 9, 2, ,
100, 10, 1, ,
112, 14, 1, ,
150, 14, 1, ,
180, 14, 1, ,
181, 15, 1, ,
200, 15, 1, ,
200, 15, 2, ,
200, 15, 3, ,
200, 17, 1, ,
210, 8, 1, ,
270, 10, 1, ,
280, 11, 1, ,
305, 11, 1, ,
305, 11, 2, ,
322, 16, 1, ,
377, 16, 1, ,
410, 8, 1, ,
410, 8, 2, ,
421, 10, 1, ,
422, 10, 1, ,
;
And the datasets I want are
data One;
infile datalines dsd;
input ID : $10. Season Count i ;
datalines;
100, 10, 1, 1,
112, 14, 1, 1,
150, 14, 1, 1,
180, 14, 1, 1,
181, 15, 1, 1,
200, 17, 1, 1,
210, 8, 1, 1,
270, 10, 1, 1,
280, 11, 1, 1,
322, 16, 1, 1,
377, 16, 1, 1,
421, 10, 1, 1,
422, 10, 1, 1,
;
data two;
infile datalines dsd;
input ID : $10. Season Count i ;
datalines;
100, 9, 1, 1,
100, 9, 2, 2,
200, 15, 1, 1,
200, 15, 2, 2,
200, 15, 3, 3,
305, 11, 1, 1,
305, 11, 2, 2,
410, 8, 1, 1,
410, 8, 2, 2,
;
.
data test;
infile datalines dsd;
input ID : $10. Season Count i ;
drop count;
datalines;
100, 9, 1, ,
100, 9, 2, ,
100, 10, 1, ,
112, 14, 1, ,
150, 14, 1, ,
180, 14, 1, ,
181, 15, 1, ,
200, 15, 1, ,
200, 15, 2, ,
200, 15, 3, ,
200, 17, 1, ,
210, 8, 1, ,
270, 10, 1, ,
280, 11, 1, ,
305, 11, 1, ,
305, 11, 2, ,
322, 16, 1, ,
377, 16, 1, ,
410, 8, 1, ,
410, 8, 2, ,
421, 10, 1, ,
422, 10, 1, ,
;
data one two;
do until(last.id);
do count=1 by 1 until(last.season);
set test;
by id season;
end;
if first.id and last.id then output two;
else output one;
end;
run;
or perhaps this--?
data one two;
do count=1 by 1 until(last.season);
set test;
by id season;
if first.id and last.id then output two;
else output one;
end;
run;
For #1 try this:
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/
For #2, PROC SORT has a newer option, NOUNIQUEKEY that makes this trivial.
But ID 100 has two entries for seasons, for season 9 there are duplicates, for season 10 there are no duplicates. So you want all ID 100 records to go into a different data set or just the duplicates?
@ybz12003 wrote:
Hi experts:
I have the test dataset list below. I would like to do two things:
1. Assign the count number like the "Count" column below
2. Separate those two datasets if the ID and season are duplicate. For example, dataset one is including IDs '100', 200', '305', and '410'. The other dataset is the rest IDs.
Thanks!
data test; infile datalines dsd; input ID : $10. Season Count i ; datalines; 100, 9, 1, , 100, 9, 2, , 100, 10, 1, , 112, 14, 1, , 150, 14, 1, , 180, 14, 1, , 181, 15, 1, , 200, 15, 1, , 200, 15, 2, , 200, 15, 3, , 200, 17, 1, , 210, 8, 1, , 270, 10, 1, , 280, 11, 1, , 305, 11, 1, , 305, 11, 2, , 322, 16, 1, , 377, 16, 1, , 410, 8, 1, , 410, 8, 2, , 421, 10, 1, , 422, 10, 1, , ;
data test;
infile datalines dsd;
input ID : $10. Season;
datalines;
100, 9
100, 9
100, 10
112, 14
150, 14
180, 14
181, 15
200, 15
200, 15
200, 15
200, 17
210, 8
270, 10
280, 11
305, 11
305, 11
322, 16
377, 16
410, 8
410, 8
421, 10
422, 10
;
data temp;
set test;
by ID Season;
count+1;
if first.Season then count=1;
run;
data singles duplicates;
if _N_ = 1 then do;
declare hash h(dataset:'temp(where=(count>1))');
h.defineKey('ID');
h.defineDone();
end;
set temp;
if h.check()=0 then output duplicates;
else output singles;
run;
Personally, I feel the requirement is much easier than having to use hash. However, I appreciate your enthusiam in offering various solutions. That's so cool. Love that attitude!
Even with hash, you could consider just one step
data test;
infile datalines dsd;
input ID : $10. Season Count i ;
drop count;
datalines;
100, 9, 1, ,
100, 9, 2, ,
100, 10, 1, ,
112, 14, 1, ,
150, 14, 1, ,
180, 14, 1, ,
181, 15, 1, ,
200, 15, 1, ,
200, 15, 2, ,
200, 15, 3, ,
200, 17, 1, ,
210, 8, 1, ,
270, 10, 1, ,
280, 11, 1, ,
305, 11, 1, ,
305, 11, 2, ,
322, 16, 1, ,
377, 16, 1, ,
410, 8, 1, ,
410, 8, 2, ,
421, 10, 1, ,
422, 10, 1, ,
;
data one two;
if _n_=1 then do;
dcl hash H (dataset:'test',multidata:'y') ;
h.definekey ("id") ;
h.definedata ("id") ;
h.definedone () ;
end;
set test;
by id season;
if first.season then count=1;
else count+1;
if first.id then do;rc=h.check(); h.has_next(result:_iorc_);end;
if _iorc_ then output one;else output two;
run;
If the DOW loop was little confusing, the requirement is as simple as this
Just a matter of diligent use of automativ by group variables first and last
data test;
infile datalines dsd;
input ID : $10. Season Count i ;
drop count;
datalines;
100, 9, 1, ,
100, 9, 2, ,
100, 10, 1, ,
112, 14, 1, ,
150, 14, 1, ,
180, 14, 1, ,
181, 15, 1, ,
200, 15, 1, ,
200, 15, 2, ,
200, 15, 3, ,
200, 17, 1, ,
210, 8, 1, ,
270, 10, 1, ,
280, 11, 1, ,
305, 11, 1, ,
305, 11, 2, ,
322, 16, 1, ,
377, 16, 1, ,
410, 8, 1, ,
410, 8, 2, ,
421, 10, 1, ,
422, 10, 1, ,
;
data one two;
set test;
by id season;
if first.season then count=1;
else count+1;
if first.id and last.id then output two;
else output one;
run;
Hi all:
Unfortunately, the result I am looking for is not what you created. Please see my edited first message. Sorry for the confusion.
proc sort data=have uniqueout=unique nouniquekey out=duplicates; by ID season; run;
@ybz12003 wrote:
Hi all:
Unfortunately, the result I am looking for is not what you created. Please see my edited first message. Sorry for the confusion.
Please don't edit your posts, it makes it harder to follow later on. Use a new response.
I have used the same code, it did not work. Please see the result I want.
OK here you go
data one two;
set test;
by id season;
if first.season then count=1;
else count+1;
if first.season and last.season then output one;
else output two;
run;
@ybz12003 The change is very minor
we use first.season and last.season, instead of first.id and last.id pairs
Hi:
The code you created was good. However, I also need the column 'i' is exactly the same as column 'count' because I need to use it for further transpose. The column 'count' you have is not the one I want.
data one two;
set test;
by id season;
if first.season then count=1;
else count+1;
if first.season and last.season then output one;
else output two;
run;
@ybz12003 wrote:
Hi:
The code you created was good. However, I also need the column 'i' is exactly the same as column 'count' because I need to use it for further transpose. The column 'count' you have is not the one I want.
data one two; set test; by id season; if first.season then count=1; else count+1; if first.season and last.season then output one; else output two; run;
I is blank in your sample data and if you want it exactly as count, why not just assign it? If count already exists on the data set the RETAIN won't work correctly. You do not need to add a counter to this automatically when transposing, SAS does that automatically.
I tested it, it works exactly as requested given your sample data. The rest is up to you now.
@ybz12003 wrote:
I have used the same code, it did not work. Please see the result I want.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.