Hi all,
I am very, very new to SAS. I have a dataset with about 300 variables with either a 1 or a 0 if that person does or does not meet those qualifcations.
So something like
Person | Var1 | Var2 | Var3 | Var4 |
52 | 1 | 0 | 1 | 0 |
32 | 1 | 0 | 1 | 0 |
100 | 0 | 0 | 1 | 0 |
80 | 1 | 0 | 1 | 1 |
should be
Person | Var1 | Var3 | Var4 |
52 | 1 | 1 | 0 |
32 | 1 | 1 | 0 |
100 | 0 | 1 | 0 |
80 | 1 | 1 | 1 |
Let me know if you have any ideas!! I am very new to SAS so have no idea where to even begin, but this is needed for my work : )
Thanks
Run a proc summary, transpose the result, and create a drop statement dynamically from that:
data have;
input person $ var1 var2 var3;
datalines;
52 1 0 1 0
32 1 0 1 0
100 0 0 1 0
80 1 0 1 1
;
run;
proc summary data=have;
var _numeric_;
output out=check (drop=_type_ _freq_) sum=;
run;
proc transpose data=check out=check2;
var _all_;
run;
data _null_;
set check2 end=eof;
if _n_ = 1 then call execute('data want; set have; drop');
if col1 = 0 then call execute(' ' !! trim(_name_));
if eof then call execute('; run;');
run;
Note that I posted example data in a data step with datalines; this makes it easy for everyone to recreate the exact same dataset with just copy/paste and submit.
Here's my solution.
Do nothing to var2.
In any code you write, do not mention or use var2.
Run a proc summary, transpose the result, and create a drop statement dynamically from that:
data have;
input person $ var1 var2 var3;
datalines;
52 1 0 1 0
32 1 0 1 0
100 0 0 1 0
80 1 0 1 1
;
run;
proc summary data=have;
var _numeric_;
output out=check (drop=_type_ _freq_) sum=;
run;
proc transpose data=check out=check2;
var _all_;
run;
data _null_;
set check2 end=eof;
if _n_ = 1 then call execute('data want; set have; drop');
if col1 = 0 then call execute(' ' !! trim(_name_));
if eof then call execute('; run;');
run;
Note that I posted example data in a data step with datalines; this makes it easy for everyone to recreate the exact same dataset with just copy/paste and submit.
Yes, Kurt that does seem to solve the problem, but I prefer my solution better, especially since I strive to be a "maximally efficient programmer".
I guess the OP wanted a solution for automatically cleaning out irrelevant columns from datasets. And it tickled me to create something completely data-driven 😉
Thanks for the response! This looks very easy to follow and recreate. Though I tried running this, and it just gave me my table back with the 0's still there, the table I am trying to filter is "my_table". What went wrong??
proc summary data=my_table;
var _numeric_;
output out=check (drop=_type_ _freq_) sum=;
run;
proc transpose data=check out=check2;
var _all_;
run;
data _null_;
set check2 end=eof;
if _n_ = 1 then call execute('data want; set unix.my_table; drop');
if col1 = 0 then call execute(' ' !! trim(_name_));
if eof then call execute('; run;');
run;
Nevermind, it's because my variables were stored as text! Thanks a lot for the help!
@scouttier wrote:
Hi all,
I am very, very new to SAS. I have a dataset with about 300 variables with either a 1 or a 0 if that person does or does not meet those qualifcations.
Let me know if you have any ideas!! I am very new to SAS so have no idea where to even begin, but this is needed for my work : )
Thanks
Could you please describe why removing variables with values of all 0 are actually needed.
If I were to encounter such data I would first report back to the data source/collector/compiler/designer or whoever is in charge of the project about the behavior of variables with all any value (1, 0 or 99999). Sometimes invariant data occurs due to any of number of issues such as transcription errors, errors in extraction code or creation of a text file used for import or even an incorrect format assignment in SAS (the values by default may look like 0 but actually contain other values such as display format is F1.0 for instance and values are less than 0.5).
It may be that this is part of a multiple phase collection effort and one or more phases may all be known that "in phase XXX we expect all 0 values for variables X, Y and Z" in which case removing them would really be the wrong thing to do.
I have actually received data with a similar issue values of only 1 and 0 where we expected 1 to 50. The actual issue was the program creating a fixed column text file was writing another variable over the columns for the specific field.
Hello.
I created the table in Excel to describe if a person has a qualification or not, and I later need to modify it by joining it to another table / transposing it. It's taking way too long with all the unnecessary variables : )
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.