I have game stats from the 2018-2019 NBA season but some data values are double counted with the team names switching variables. This is what the data looks like. Not all observations have this issue this is just what It looks like at certain points in the data
22 | UTA | 10/22/2018 | MEM |
23 | OKC | 10/22/2018 | GSW |
24 | MEM | 10/22/2018 | UTA |
25 | GSW | 10/22/2018 | UTA |
If this were my data I would be tempted to place the team names in order and then sort the data, removing duplicates:
Something like:
data have; input num teama $ date :mmddyy10. teamb $; format date mmddyy10.; datalines; 22 UTA 10/22/2018 MEM 23 OKC 10/22/2018 GSW 24 MEM 10/22/2018 UTA 25 GSW 10/22/2018 UTA ; data need; set have; array t(*) teama teamb; call sortc(of t(*)); run; proc sort data=need out=want nodupkey; by date teama teamb; run;
You should provide example data in form of a data step as above. That way we do not have to guess variable names or properties.
If the ORDER of the team names is important, such as winner / loser then you would need to add additional information in the NEED data step to capture that information into a new variable.
Could you please provide us with the sample data and explain with any example where the values are doubling up?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.