Hello community experts,
Please assist me in understanding how to drop columns from my final output dataset table if the column has all equal test results.
I have many SAS datasets that I use proc compare to find none equals and looking over them to find the columns of interest is tedious.
(1 of table containing 881 variables and 3.2 million records, So the examples below have been modified for simplicity).
proc compare base=&mydata. compare=&mydata_new. out=&mydata._&mydata_new.(drop=_TYPE_ _obs_) outnoequal;
by &var1. &var2. ;
run;
After running proc compare has completed I obtain information much like the below example.
Want 1: In this case example, I want the final dataset to only contain columns Anumber, Bnumber and V87 since V87 was the only column found to have differences. I want to drop columns V1 - V8 because they are all equal.
Anumber Bnumber V1 V2 V3 V4 V5 V6 V7 V8 V87 1 A12 E E E E E E E E -1 2 A21 E E E E E E E E -5 3 B12 E E E E E E E E -1 4 B21 E E E E E E E E 1 5 B23 E E E E E E E E -1 6 C24 E E E E E E E E -1 7 C26 E E E E E E E E 1 8 C61 E E E E E E E E -1 9 D14 E E E E E E E E -1 10 D19 E E E E E E E E -5 11 D21 E E E E E E E E -5 12 D47 E E E E E E E E 1 13 D81 E E E E E E E E 1 14 D97 E E E E E E E E -1 15 D99 E E E E E E E E -1
Want 2: While in this 2nd case I want the final dataset to only contain columns Anumber, Bnumber, V3, V6 and V87 since these columns were found to have differences. I want to drop columns V1, V2, V4, V5, V7 and V8 because they are all equal. .
Anumber Bnumber V1 V2 V3 V4 V5 V6 V7 V8 V87 1 A12 E E 2 E E E E E -1 2 A21 E E E E E E E E -5 3 B12 E E E E E 7 E E -1 4 B21 E E E E E E E E 1 5 B23 E E E E E E E E -1 6 C24 E E E E E E E E -1 7 C26 E E E E E E E E 1 8 C61 E E E E E E E E -1 9 D14 E E E E E E E E -1 10 D19 E E E E E E E E -5 11 D21 E E E E E E E E -5 12 D47 E E E E E E E E 1 13 D81 E E E E E E E E 1 14 D97 E E E E E E E E -1 15 D99 E E E E E E E E -1
As a note the true variable names are not in the form of V1, V87, but are more like first_name, Last_name, DOB, Gender...…
How can I achieve my wants using SAS?
I would start by following the proc compare code with something like this:
ods select nlevels; ods output nlevels= work.levelset; proc freq data=&mydata._&mydata_new. nlevels;run;
The output set will contain a variable TableVar with the name of the variable, Nlevels the number of levels of values for the variable, NMisslevels number of missing levels and NNonMISSLevels.
The variables with Nlevels =1 and NMissLevels=1 have all the values as equal. That would give you a data set to extract variable names from that do not match
proc sql; select Tablevar into :KeepList separated by ' ' from work.levelset where nlevels ne 1 or NmissLevels ne 1 ; quit; data want; set &mydata._&mydata_new. (keep=&keeplist.); run;
I would start by following the proc compare code with something like this:
ods select nlevels; ods output nlevels= work.levelset; proc freq data=&mydata._&mydata_new. nlevels;run;
The output set will contain a variable TableVar with the name of the variable, Nlevels the number of levels of values for the variable, NMisslevels number of missing levels and NNonMISSLevels.
The variables with Nlevels =1 and NMissLevels=1 have all the values as equal. That would give you a data set to extract variable names from that do not match
proc sql; select Tablevar into :KeepList separated by ' ' from work.levelset where nlevels ne 1 or NmissLevels ne 1 ; quit; data want; set &mydata._&mydata_new. (keep=&keeplist.); run;
Thank you @ballardw this provide what I needed to achieve in my examples.
1 more Question is there an option for testing $ variables and if so what would that option be? I have a few columns left over that are equal represented with dot dot (..).
@VDD wrote:
Thank you @ballardw this provide what I needed to achieve in my examples.
1 more Question is there an option for testing $ variables and if so what would that option be? I have a few columns left over that are equal represented with dot dot (..).
The process works to identify variables with at least two values in the data. If all of the character values are exactly the same the nlevels value should be 1 and the NNonMissLevels should be 1. The difference is that the E that you see for numeric in the comparison output set is a special missing and for character number of . indicate the positions of matching characters, mismatched characters have an X
Just for fun.
data have;
infile cards expandtabs truncover;
input (Anumber Bnumber V1 V2 V3 V4 V5 V6 V7 V8 V87) ($);
cards;
1 A12 E E E E E E E E -1
2 A21 E E E E E E E E -5
3 B12 E E E E E E E E -1
4 B21 E E E E E E E E 1
5 B23 E E E E E E E E -1
6 C24 E E E E E E E E -1
7 C26 E E E E E E E E 1
8 C61 E E E E E E E E -1
9 D14 E E E E E E E E -1
10 D19 E E E E E E E E -5
11 D21 E E E E E E E E -5
12 D47 E E E E E E E E 1
13 D81 E E E E E E E E 1
14 D97 E E E E E E E E -1
15 D99 E E E E E E E E -1
;
run;
data _null_;
set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE')) end=last;
if _n_=1 then call execute('proc sql;create table temp as select ');
call execute(catx(' ','min(',name,')=max(',name,') as ',name));
if last then call execute('from have;quit;');
else call execute(',');
run;
proc transpose data=temp out=temp1;
run;
proc sql noprint;
select _name_ into :keep separated by ' '
from temp1
where col1=0;
quit;
data want;
set have;
keep &keep;
run;
Thank you @Ksharp this also removes the $ columns with all equal.
I should have asked about the $ to begin with but that was an oversight on my part.
I want to thank everyone for assisting with this task as it will make my work easier.
This was also a learning experience for me learning about the nlevels thanks @ballardw and all that contribute to the community links.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.