- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a data of 100+ variables.
I want to apply the simple operation to all variable: If variable_value >10000 then variable_value=.;
I dont want to repeat this line for 100+ line.
So is there any way to array all variables without listing their name so I can get it done?
Also, if I have the first 3 variable that should be excluded from the rule , how can I array all except for them. I can list their 3 name for excluding.
Thank you for your help.
HC
data have;
input var1 a b c;
cards;
1 10 2 1000000
2 10 3 1
3 10 4 0
4 . 100000 1000000
5 100000 1 2
6 10 2 300000
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If they really are the first three variables then just index over the array starting with the index set to 4.
data want;
set have;
array _all _numeric_;
do _n_=4 to dim(_all);
if _all(_n_) > 10000 then _all(_n_)=.;
end;
run;
If they aren't the first three, you can make them the first three by adding another SET statement with a KEEP= option.
data want;
set have(keep=var1 var2 var3);
set have;
array _all _numeric_;
do _n_=4 to dim(_all);
if _all(_n_) > 10000 then _all(_n_)=.;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Firstvar--lastvar;
Other options include querying the SAS help table vcolumn to get the variable list. Thats been illustrated on the forums quite a bit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am not sure if it is possible for us to do the arrays without the variable names. We need the variable names in arrays. However, alternatively what we could do is we could get the list of all other variable names except the names which are not required (3 variable what you mentioned) into a macro variable.
Something like below
after you create the dataset with all the variables and data
write
proc sql;
select distinct name into : vars separated by ' ' from dictionary.columns where memname='dataset name' and name not in ('x1','x2','x3');
select count(distinct name) into : cnt from dictionary.columns where memname='dataset name' and name not in ('x1','x2','x3');
quit;
we get list of all variable names excluding the three variables into vars macro variable separated by blank space. In teh cnt macro variable we get the number of variable count.
then write the arrays as below
data arrays;
set have;
array vr(&cnt) &vars;
do i = 1 to &cnt;
If vr(i) >10000 then vr(i)=.;
end;
run;
Jag
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If the list of variables is fairly long and the list of excluded variables is short, it would be simpler to list the latter explicitly. For example, to exclude var1 and b :
data want;
set have;
array _v{*} _numeric_;
do _i = 1 to dim(_v);
if upcase(vname(_v{_i})) not in ("VAR1", "B") then
if _v{_i} > 10000 then call missing(_v{_i});
end;
drop _i;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If they really are the first three variables then just index over the array starting with the index set to 4.
data want;
set have;
array _all _numeric_;
do _n_=4 to dim(_all);
if _all(_n_) > 10000 then _all(_n_)=.;
end;
run;
If they aren't the first three, you can make them the first three by adding another SET statement with a KEEP= option.
data want;
set have(keep=var1 var2 var3);
set have;
array _all _numeric_;
do _n_=4 to dim(_all);
if _all(_n_) > 10000 then _all(_n_)=.;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
These codes are awesome!
Very easy to follow.
So much thanks.
HC
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If it is important to keep the variables in order as in @PGStats program does you can alternatevly built the variable list explicitly.
data have;
input var1 a b c;
var3 = _n_*10;
d = a;
var2 = var1*10;
cards;
1 10 2 1000000
2 10 3 1
3 10 4 0
4 . 100000 1000000
5 100000 1 2
6 10 2 300000
run;
proc print;
run;
proc transpose data=have(obs=0 drop=var: keep=_numeric_) out=vars;
run;
proc sql noprint;
select nliteral(_name_) into :varlist separated by ' ' from vars;
quit;
%put NOTE: &=varlist;
data change;
set have;
array _x[*] &varlist;
do _n_ = 1 to dim(_x);
if _x[_n_] ge 1e5 then _x[_n_]=.;
end;
run;
proc print;
run;