Hi everyone!
I have a SAS dataset with more than 1000 variables distributed like this:
ID Auto Auto_2 Auto_3 City City_2 Sport Sport_2 Sport_3 Sport_4 ... and so on (many different variables repeated different times).
I need to:
1) keep only those rows that have missing values in all the variables ending with "_number", (or, equally, drop those rows that have values in any of the variables ending with "_number").
For example:?
ID Auto Auto_2 Auto_3 City City_2
01 Fiat Ford Milan ---> Drop row
02 Fiat Milan Rome ---> Drop row
03 Fiat Ford Ford Milan Rome ---> Drop row
04 Fiat Milan ---> KEEP ROW
If it can help, if Variable_n is missing, then Variable_(n+1) is missing too.
2) keep only the columns without the "_number".
In the previous example only ID Auto City must be kept.
The problem in both cases is the same: I can't write all the variables in a drop/keep statement or in a IF condition because they're too many.
Is there a smart way to manage variables like these? Maybe something like lists...? I don't know.
Thanks a lot
Irene
data have;
input (ID Auto Auto_2 Auto_3 City City_2 ) ($);
cards;
01 Fiat Ford . Milan .
02 Fiat . . Milan Rome
03 Fiat Ford Ford Milan Rome
04 Fiat . . Milan .
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql;
select _name_ into : list separated by ',' from temp where prxmatch('/_\d+$/',strip(_name_));
select count(*) into : n from temp where prxmatch('/_\d+$/',strip(_name_));
quit;
data want;
set have;
if cmiss(&list)=&n;
run;
You can only use short cut references for variable lists when you have prefixes
Otherwise you can create a macro variable with the list of variables by querying the SASHELP.VCOLUMN table.
I've never used the SASHELP.VCOLUMN table, I'll have a look!
Saying shortcut references do you mean, for example "Var:" to use all variables starting with "Var"? I've aIready tried this method in a DROP statement, but I think I can't use it in a IF statement.. Am I wrong?
Any ideas for the IF statement?
Thanks
Irene
Well, you could put the _numbers in an array, then loop over the array:
data want;
set have;
array elem{*} auto-city2;
do i=2 to dim(elem{*});
if elem{i}="" and elem{i-1}="" then flag="Y";
end;
run;
Then drop based on flag (note untested code!).
Personally, in these instances you may be better off normalising your data, i.e. have on row per data element:
ID SECTION NUMBER VALUE
1 AUTO 1 Fiat
1 AUTO 2 Ford
...
Note that you have a special condition in your data: If Variable_N is missing, then Variable_N+1 is also missing. This means that you don't have to process all your variables. You could process only the "_2" variables and ignore the rest.
Here's a beginning to a solution. See if you need more than that.
proc contents data=have noprint out=_contents_ (keep=name);
run;
*** Just so you know what you are working with here ...;
proc print data=_contents_ (obs=10);
run;
data keeplist _2_list;
set _contents_;
last_part = scan(name, -1, '_');
if last_part='2' then output _2_list;
else if input(last_part, ??5.) < 2 then output keeplist;
run;
*** Again, just so you know what you are working with here ...;
proc print data=keeplist (obs=10);
proc print data=_2_list (obs=10);
run;
proc sql;
select trim(name) into : keeplist separated by ' ' from keeplist;
select trim(name) into : _2_list separated by ' ' from _2_list;
quit;
%put &keeplist;
%put &_2_list;
The rest will be (by comparison) easier, but see if this much makes sense.
Good luck.
data have;
input (ID Auto Auto_2 Auto_3 City City_2 ) ($);
cards;
01 Fiat Ford . Milan .
02 Fiat . . Milan Rome
03 Fiat Ford Ford Milan Rome
04 Fiat . . Milan .
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql;
select _name_ into : list separated by ',' from temp where prxmatch('/_\d+$/',strip(_name_));
select count(*) into : n from temp where prxmatch('/_\d+$/',strip(_name_));
quit;
data want;
set have;
if cmiss(&list)=&n;
run;
Dictionary.columns would cost some time to query these information.
If you just want get variable's name ,proc transpose is good and fast.
For your second Q.
data have;
input (ID Auto Auto_2 Auto_3 City City_2 ) ($);
cards;
01 Fiat Ford . Milan .
02 Fiat . . Milan Rome
03 Fiat Ford Ford Milan Rome
04 Fiat . . Milan .
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql;
select _name_ into : list separated by ' ' from temp where not prxmatch('/_\d+$/',strip(_name_));
quit;
data want;
set have;
keep &list;
run;
Thank you so much!
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.