Hi everyone!
I have a SAS dataset with more than 1000 variables distributed like this:
ID Auto Auto_2 Auto_3 City City_2 Sport Sport_2 Sport_3 Sport_4 ... and so on (many different variables repeated different times).
I need to:
1) keep only those rows that have missing values in all the variables ending with "_number", (or, equally, drop those rows that have values in any of the variables ending with "_number").
For example:?
ID Auto Auto_2 Auto_3 City City_2
01 Fiat Ford Milan ---> Drop row
02 Fiat Milan Rome ---> Drop row
03 Fiat Ford Ford Milan Rome ---> Drop row
04 Fiat Milan ---> KEEP ROW
If it can help, if Variable_n is missing, then Variable_(n+1) is missing too.
2) keep only the columns without the "_number".
In the previous example only ID Auto City must be kept.
The problem in both cases is the same: I can't write all the variables in a drop/keep statement or in a IF condition because they're too many.
Is there a smart way to manage variables like these? Maybe something like lists...? I don't know.
Thanks a lot
Irene
data have;
input (ID Auto Auto_2 Auto_3 City City_2 ) ($);
cards;
01 Fiat Ford . Milan .
02 Fiat . . Milan Rome
03 Fiat Ford Ford Milan Rome
04 Fiat . . Milan .
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql;
select _name_ into : list separated by ',' from temp where prxmatch('/_\d+$/',strip(_name_));
select count(*) into : n from temp where prxmatch('/_\d+$/',strip(_name_));
quit;
data want;
set have;
if cmiss(&list)=&n;
run;
You can only use short cut references for variable lists when you have prefixes
Otherwise you can create a macro variable with the list of variables by querying the SASHELP.VCOLUMN table.
I've never used the SASHELP.VCOLUMN table, I'll have a look!
Saying shortcut references do you mean, for example "Var:" to use all variables starting with "Var"? I've aIready tried this method in a DROP statement, but I think I can't use it in a IF statement.. Am I wrong?
Any ideas for the IF statement?
Thanks
Irene
Well, you could put the _numbers in an array, then loop over the array:
data want;
set have;
array elem{*} auto-city2;
do i=2 to dim(elem{*});
if elem{i}="" and elem{i-1}="" then flag="Y";
end;
run;
Then drop based on flag (note untested code!).
Personally, in these instances you may be better off normalising your data, i.e. have on row per data element:
ID SECTION NUMBER VALUE
1 AUTO 1 Fiat
1 AUTO 2 Ford
...
Note that you have a special condition in your data: If Variable_N is missing, then Variable_N+1 is also missing. This means that you don't have to process all your variables. You could process only the "_2" variables and ignore the rest.
Here's a beginning to a solution. See if you need more than that.
proc contents data=have noprint out=_contents_ (keep=name);
run;
*** Just so you know what you are working with here ...;
proc print data=_contents_ (obs=10);
run;
data keeplist _2_list;
set _contents_;
last_part = scan(name, -1, '_');
if last_part='2' then output _2_list;
else if input(last_part, ??5.) < 2 then output keeplist;
run;
*** Again, just so you know what you are working with here ...;
proc print data=keeplist (obs=10);
proc print data=_2_list (obs=10);
run;
proc sql;
select trim(name) into : keeplist separated by ' ' from keeplist;
select trim(name) into : _2_list separated by ' ' from _2_list;
quit;
%put &keeplist;
%put &_2_list;
The rest will be (by comparison) easier, but see if this much makes sense.
Good luck.
data have;
input (ID Auto Auto_2 Auto_3 City City_2 ) ($);
cards;
01 Fiat Ford . Milan .
02 Fiat . . Milan Rome
03 Fiat Ford Ford Milan Rome
04 Fiat . . Milan .
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql;
select _name_ into : list separated by ',' from temp where prxmatch('/_\d+$/',strip(_name_));
select count(*) into : n from temp where prxmatch('/_\d+$/',strip(_name_));
quit;
data want;
set have;
if cmiss(&list)=&n;
run;
Dictionary.columns would cost some time to query these information.
If you just want get variable's name ,proc transpose is good and fast.
For your second Q.
data have;
input (ID Auto Auto_2 Auto_3 City City_2 ) ($);
cards;
01 Fiat Ford . Milan .
02 Fiat . . Milan Rome
03 Fiat Ford Ford Milan Rome
04 Fiat . . Milan .
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql;
select _name_ into : list separated by ' ' from temp where not prxmatch('/_\d+$/',strip(_name_));
quit;
data want;
set have;
keep &list;
run;
Thank you so much!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.