I have a dataset with about 1000 variables. There are many variables that are coded the same but names are different, see an example, those variables have the same information: nHits and fewfe, also those variables have the same values except League contain some missing data: League dfgwe_23.
How can I find out those variables? So, I wanted to get the list of variables (see below attached) with the same values. In this example fake data, the following variables have the same values except some missing (should ignore missing). Thank you.
Want the following list:
CrAtBat = ab
CrBB =fkae
CrHits =vrgsd
CrHome =dfwef
CrRbi =vewef
CrRuns =fff
Div =wsfde_d
Division = vdfgew
League =dfgwe_23
Position = fgerg
Salary =vregf
Team =fre
YrMajor = rewa
logSalary = fwef
nAssts =cefwe
nAtBat =efqwef
nBB =qfqewfq
nError =qfefe
nHits =fewfe
What do you mean by "coded the same" exactly?
i am not that fond of writing a data step to read a file with 1k variables, please share the code you have used to read the file.
To assist you in finding possible duplicate columns, you can do this:
proc import
out=baseball
datafile="/folders/myfolders/baseball2.csv"
dbms=csv
replace
;
run;
proc transpose data=baseball out=long;
by name;
var _numeric_;
run;
proc sort data=long;
by name col1;
run;
That looks like a bad joke. There are several columns with identical names, there's no logical order, the header line can't be used for valid SAS names, .....
If it had been something like
player_name position_1986 salary_1986 at_bats_1986 position_1987 salary_1987 at_bats_1987
and so on, then you could rather easily transpose it into something useful, but this?
"Player's Name","Team at the End of 1986","Times at Bat in 1986","Hits in 1986","Home Runs in 1986","Runs in 1986","RBIs in 1986","Walks in 1986","Years in the Major Leagues","Career Times at Bat","Career Hits","Career Home Runs","Career Runs","Career RBIs","Career Walks","League at the End of 1986","Division at the End of 1986","Position(s) in 1986","Put Outs in 1986","Assists in 1986","Errors in 1986","1987 Salary in $ Thousands","League and Division","Log Salary","Team at the End of 1986","Times at Bat in 1986","Hits in 1986","Walks in 1986","Years in the Major Leagues","Career Times at Bat","Career Hits","Career Home Runs","Career Runs","Career RBIs","Career Walks","League at the End of 1986","Division at the End of 1986","Position(s) in 1986","Assists in 1986","Errors in 1986","1987 Salary in $ Thousands","League and Division","Log Salary"
1) Are those duplicates all only numeric type or also character type?
2) Can it be that for some observations their values are the same but in other observation,
those same variables have different value?
3) Assume you found the duplicate values, what do you want to do with that information?
What kind of output you want?
Mainly, theoretically, you can define an array, sort it and then compare values easily.
You have not answer the second question:
2) Can it be that for some observations their values are the same but in other observation,
those same variables have different value? (neglecting missing values).
See solution in next link - a very similar question:
https://communities.sas.com/t5/SAS-Programming/Values-the-same/m-p/663839/highlight/false#M198244
That's completely different from your initial question. See Maxim 42.
What do you mean by "one may have more ID" - is ID unique at one or both datasets?
Suppose you have unique ID in table1 bot more than one observations of same id in table 2 - what do you want to compare? and how to treat the many to one matching ?
I suggest post a sample of each table with upto 10 observations and choose a sample of variables you noticed as same vale and few of different value to serve as test date, then post the output you expect regarding those samples.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.