For example, I have one dataset that has player_id 1=555, player_id_2=600, etc. through player_id_10
I have another dataset that contains a list of 1,000 player_id's. I want to check, for each observation in data set 1, if each player_id is contained in the list in dataset 2.
so if 555 is on the list, I want it to tell me so (maybe create a new variable that says player_id_1_check=1). If 600 isnt, maybe the new variable says 0.
Is that clear? Thanks for any help.
Given the structure of your data, my instinct is to create a format from your list of player_id's and then read across your main dataset to see if the formatted value exists. Here's an example using 3 player_id's..
data have1;
input player_id1-player_id3;
cards;
555 600 740
;
run;
data have2;
input player_id;
cards;
555
650
700
740
;
run;
data id_fmt;
set have2 (rename=(player_id=start));
retain fmtname 'id_fmt' type 'N';
label='check';
run;
proc format cntlin=id_fmt;
run;
data want;
set have1;
array pl{*} player_id1 - player_id3;
array plchk{*} player_check_id1 - player_check_id3;
do i=1 to dim(pl);
plchk{i}=put(pl{i},id_fmt.)='check';
end;
drop i;
run;
There are so many ways to do the lookup, besides Keith's format look up, you could also do the followings, of course, 'merge' will require presorting.
data h1;
do id=1 to 10;
output;
end;
run;
data h2;
do id=1 by 2 to 20;
output;
end;
run;
/*SQL*/
proc sql;
create table want as
select id, case when id in (select id from h2) then 1 else 0 end as flag from h1;
quit;
/*Merge*/
data want;
merge h1(in=h1) h2(in=h2);
by id;
if h1;
flag=h1*h2;
run;
/*hash*/
data want;
if _n_=1 then do;
if 0 then set h2;
declare hash h2(dataset:'h2');
h2.definekey('id');
h2.definedone();
end;
set h1;
flag=not(h2.find());
run;
Haikuo
You are right. If using non-transposed data as you have presented, then that will still leave Hash() viable:
data have1;
input player_id1-player_id3;
cards;
555 600 740
;
data have2;
input player_id;
cards;
555
650
700
740
;
/*hash*/
data want (drop=player_id);
if _n_=1 then do;
if 0 then set have2;
declare hash h2(dataset:'have2');
h2.definekey('player_id');
h2.definedone();
end;
set have1;
array h1 player_id1-player_id3;
array flag player_check_id1 - player_check_id3;
do over h1;
flag=not(h2.check(key: h1));
end;
run;
Haikuo
And, another way to do the lookup would be to use the in function. e.g.:
data have1;
input player_id1-player_id3;
cards;
555 600 740
;
run;
data have2;
input player_id;
cards;
555
650
700
740
;
run;
data want;
if _n_ eq 1 then set have1;
array players(*) player_id1-player_id3;
retain players:;
do until (eof);
set have2 end=eof;
if player_id in players then output;
end;
run;
I am having trouble seeing what this does
if _n_ eq 1 then set have1;
obviously I know the set, but what use is the if _n_ eq1 then
In this case it didn't really serve any purpose other than my own programming habits. I like to be explicitly reminded that the first set statement is only going to be called once.
Do you think there is a way to do this in reverse, IE I need to know when each player appears
player
1
2
3
game player1id player2id player3id-player10id
1 1 5 7
so whereever player 1 appears, I ge tthe game #
player game1-gameX
1 1 3 4 10 (so he appears in games 1,3, 4, and 10)
does that make sense? I already have a data set checking which players are in each game, I want to check now what games each player is in. Since there are 1,000+ games, and 10 players in each game, will the array type function still work, with the format as Keith suggested in the original problem?
Depend on what you want. If you want a field created that contains the list, its length will depend upon the maximum number of games a particular player could be in. The following assumes that the maximum could be 10 games, but it could be set to handle all 1,000.
data have;
input game player1-player10;
cards;
1 1 5 6 8 7 10 12 14 16 18
2 1 2 3 4 5 6 7 8 9 10
3 1 2 5 7 9 11 13 15 17 19
;
data need (keep=game player);
set have;
array players(*) player1-player10;
do i=1 to 10;
player=players(i);
output;
end;
run;
proc sort data=need;
by player game;
run;
data want (keep=player games);
set need;
length games $70;
retain games;
by player;
if first.player then games=game;
else games=catx(",",games,game);
if last.player then output;
run;
Just in case you already have two tables, one keyed by game, with player1-play10; another table only has players. So I stole Art's data as well as some of his ideas:
data h1;
input game player1-player10;
cards;
1 1 5 6 8 7 10 12 14 16 18
2 1 2 3 4 5 6 7 8 9 10
3 1 2 5 7 9 11 13 15 17 19
;
data h2;
do player=1 to 19; output;end;
run;
data want;
set h2;
length games $100;
do i=1 to nobs;
set h1 point=i nobs=nobs;
array players player1-player10;
if player in players then games=catx(',', games,game);
end;
keep player games;
run;
Haikuo
Alright, this worked, but I realized one small thing I need to add. In addition to the list of players there is a list of positions next to each.
So, from above:
data h1;
input game player1-player10;
cards;
1 1 5 6 8 7 10 12 14 16 18
2 1 2 3 4 5 6 7 8 9 10
3 1 2 5 7 9 11 13 15 17 19
;
the input would actually look like:
data h1;
input game player1-player10;
input game player1 pos1 player2 pos2...and so on
cards;
1 1 a 5 b 6 c 8 d 7 e 10 f 12 g 14 h 16 i 18 j
2 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i 10 k
3 1 a 2 b 5 c 7 d 9 e 11 f 13 g 15 h 17 i 19 k
;
I am trying to simply add another array and put said array in the catx function to no success.
More clarification is needed. What does the final outcome look like? Some samples please.
Haikuo
Similar to above,
player game1-gameX
1 1 3 4 10 (so he appears in games 1,3, 4, and 10)
I would need it to be somewhat like
player game1 pos1 game2 pos2.....
1 1 a 3 c 4 d 10 g.... and so forth.
it wouldnt matter if the variables are combined (for example, instead of game1 and pos1, you simply have game1 that is 1a and then game 2 is 3c and so forth. I basically need the same thing done, but the positions have to be connected to the games it found before.)
data h1;
input game
player1 pos1 $
player2 pos2 $
player3 pos3 $
player4 pos4 $
player5 pos5 $
player6 pos6 $
player7 pos7 $
player8 pos8 $
player9 pos9 $
player10 pos10 $;
cards;
1 1 a 5 b 6 c 8 d 7 e 10 f 12 g 14 h 16 i 18 j
2 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i 10 k
3 1 a 2 b 5 c 7 d 9 e 11 f 13 g 15 h 17 i 19 k
;
data need (keep=game player pos);
set h1;
array players(*) player1-player10;
array position(*) $ pos1-pos10;
do i=1 to 10;
player=players(i);
pos=position(i);
output;
end;
run;
proc sort data=need;
by player game;
run;
data want (keep=player games);
set need;
length games $70;
retain games;
by player;
game_and_pos=catx(" ",game,pos);
if first.player then games=game_and_pos;
else games=catx(",",games,game_and_pos);
if last.player then output;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.