BookmarkSubscribeRSS Feed
MichaelD
Calcite | Level 5
Ok, I have a lot variables and I need to delete more then half of them.

Unfortunately, very few are sequential (ie X1-X10) very few are grouped together (211=varnum(dataset,varnameA), 212=varnum(dataset,varnameB,... 236=varnum(dataset,varnameZ)) and then there are just a bunch all over the place.

I wrote a little macro (see end of post) that works for the cases where the variables are grouped so when I have a few separate clusters a can delete them pretty easily. Of course it can also keep lists of variables just as easily.

I'm hoping to make some changes so I can delete the variables all over the place easily.
The current macro is:

%macro VarList(DATASET,SEP,i=,n=);

proc sql;
create table VarListVars as
select varnum, name
from dictionary.columns
where memname = &DATASET.;
quit;

/* maybe this can be done inside the proc sql also*/
data VarListTemp;
set VarListVars(firstobs=&i obs=&n);
run;

%global VarList;

proc sql noprint;
select name into :VarList separated by &SEP
from VarListTemp;
quit;

/*EDIT:I guess I should add something to delete my temp datasets but I'm not too worried about that at this point*/

%mend VarList;

%VarList('dataset',' ',i=211,n=236);

Then I use
data datasetB;
set dataset;
drop/keep/(if with sep='=0 | ')/whatever &VarList;
run;

I'd like to be able to use a vector of varnums. Like change the macro to accept v={1,10,38...,all integers from 211 to 236,... 525 to 538,...} instead of i,n

Thanks Message was edited by: MichaelD
15 REPLIES 15
Cynthia_sas
SAS Super FREQ
Hi:
FIRSTOBS and OBS are not references to variable values or variable names. FIRSTOBS and OBS allow you to select observations or rows from the data. So, for example, if you do this:
[pre]
%let i = 15;
%let n = 17;
proc print data=sashelp.class(firstobs=&i obs=&n);
title "Firstobs=&i and obs=&n";
run;
[/pre]

Then, PROC PRINT will start selecting rows or observations with ROW 15 and will continue to select observations until the OBS= value is reached (in this case, ROW 17). This means that PROC PRINT will disply rows 15, 16 and 17 from SASHELP.CLASS.

Knowing this behavior of FIRSTOBS and OBS, I am confused about how you think your usage is going to result in the deletion of "all integers from 211 to 236" unless your choice of those same values for &I and &N was coincidental.

When you talk about deleting "integers", to me that seems like you're saying you want to delete variable VALUES, not the variables themselves.

You indicate that you are trying to delete the variables by their variable number or position in the data, but that seems an unnecessary complication or distinction. If you know that you want to delete VARNAMEA, VARNAMEB and VARNAMEZ, then you only need to list those variable names in your DROP statement (or list the variables you want to keep in your KEEP statement).

In addition, I find it hard to undestand how your PROC SQL step is working. MEMNAME in dictionary.columns is a character variable, so the usual construction is:
[pre]
where memname = "&DATASET" OR
where upcase(memname) = "%upcase(&DATASET)"
[/pre]

For some more ideas about how to use DICTIONARY.COLUMNS, you might search through previous forum postings for some of the many examples that have been previously posted here.

In addition, these papersand reference documentation might prove useful:
http://support.sas.com/documentation/cdl/en/sqlproc/62086/HTML/default/a001385596.htm
http://support.sas.com/resources/papers/proceedings09/053-2009.pdf
http://www2.sas.com/proceedings/sugi30/057-30.pdf
http://www2.sas.com/proceedings/sugi30/070-30.pdf
http://www2.sas.com/proceedings/sugi29/237-29.pdf
http://www.codecraftersinc.com/pdf/DictionaryTablesRefCard.pdf
http://www.lexjansen.com/pharmasug/2006/tutorials/tu03.pdf
http://www.qsl.net/kd6ttl/sas/sqlutilov.pdf

cynthia
MichaelD
Calcite | Level 5
Well the output of:
proc sql;
create table VarListVars as
select varnum, name
from dictionary.columns
where memname = &DATASET.;
quit;

is a 2 column dataset with variables varnum and name of the variables from &DATASET.
So observation i in the new data set VarListVar is varnum=i name='whatever the ith variable of &DATASET is named'

My problem is I have hundreds of variables and of those I have about 100-200 I need to keep. Its impractical to type out all the variable names I want to keep or delete since this will lead to errors. Its not as error prone for me to type out a vector of variable numbers I want to keep though. And that's what I'd like to change my macro to be able to do.

Since the output VarListVars is a list of all variables and their corresponding varnum I can (semi) easily scan the list for the variables I want to delete or keep
Flip
Fluorite | Level 6
Since you are creating the valistvars table, why not edit that table removing the variables you want to drop, then create a macro variable containing the remaining names with an into clause then using that macro variable in a keep statement?

select name into :keepnam from varlistvars ...

data ;
set original;
keep ( &keepnam);
MichaelD
Calcite | Level 5
I actually didn't think of that, I'm a very hands off user. It definitely works. Thanks.

But I would still like to implement using a vector of variable numbers if anyone else comes up with an idea. The thing is deleting by hand is not very repeatable. Sure, I can export the dataset so I can reuse the list of variables it but if don't have access to that file I'd be stuck recreating what I need.

Or worse, I may pass on my final code (not this macro) and someone might not know how to use the files which give the VarList strings
ArtC
Rhodochrosite | Level 12
Here is a way to retrieve variable names based on variable numbers in a DATA step. The OPEN, VARNAME, and CLOSE functions adapt very well to the macro language as well.

[pre]
data varnums;
number = 1; output varnums;
number = 4; output varnums;
run;

data _null_;
length namelist $500;
dsid = open('sashelp.class');
do until (done);
set varnums end=done;
vname = varname(dsid,number);
put vname=;
namelist = catx(' ',namelist,vname);
end;
call symputx('namelist',namelist);
rc = close(dsid);
stop;
run;
%put namelist &namelist;
[/pre]
Peter_C
Rhodochrosite | Level 12
MichaelD
to make a solution "repeatable" would be quite simple --- if --- you have a simple rule for inclusion or exclusion of the columns
--- or you have a data set model (having just the columns you want on your final table)

Are you in either of these situations?

PeterC
data_null__
Jade | Level 19
I believe the IN operator provide the functionally you seek.

[pre] if varnum in(1 3 4:12 50:60 80:100 19); [/pre]


[pre]
*** Create data;
proc plan seed=477976307;
factors name = 100 prefix=1 of 4 / noprint;
output out=sample prefix cvals=('A' 'B' 'C' 'D');
run;
data sample;
set sample;
length _name_ $32;
_name_ = cats(prefix,name);
run;
proc transpose data=sample out=sample(drop=_:);
var _name_;
id _name_;
run;

*** var num list;
%let dataset=sample;
proc sql;
create table VarListVars as
select varnum, name
from dictionary.columns
where libname eq 'WORK' and memname = "%upcase(&DATASET.)";
quit;
run;

*** use this data set to create a list of variables to drop or keep;
data drop;
set varListVars;
if varnum in(1 3 4:12 50:60 80:100 19);
run;
proc print;
run;
[/pre]
MichaelD
Calcite | Level 5
data _null_;: I believe that IS what I am looking for


PeterC: I'm not sure what you mean by a 'simple rule'. Its definitely not as easy as every 3rd variable or variables that have 'c' as their 4th letter (though if you know how to do that it may be useful someday).

The ones I want almost seem to be randomly distributed and similar variables, maybe city and state, are no where near each other. I don't know.

As for a data model of the variables I want... that's what I'm building. I don't want to count on the variables being in the same order every time, but I hope 'they' won't rename variables too often.
Flip
Fluorite | Level 6
"I don't want to count on the variables being in the same order every time"

In this case you should stay away from using the variable number. You might be deleting different variables each time.

What peter means is if there is a logical way to choose names such as:

Select name .........
WHERE (some name pattern);
MichaelD
Calcite | Level 5
Flip, I think you've misunderstood me.

Say I have a dataset of word counts from today's New York Times. Maybe it was built sequentially so that variables 1-x are all in the front page article, observation one.

Now there's no problem for today since all variables will be present for all articles and in the same order. But tomorrow the sequence of the variables/words will change.

So if I want to do text mining I will export the list of 'keep' variables from today and import that all other days, making revisions as necessary, rather then rebuild my lexicon each day.

Now, this is NOT what I'm doing but I think its a clear example. I would, however, find it useful to know how to keep, for example, every variable/word ending in 'ed' or every variable with 'c' as the third letter. I think those fall under "some name pattern".

Thanks for the help.
Flip
Fluorite | Level 6
This sort of approach should do that.

proc sql;
select name into :keeplst separated by ' ' from dictionary.columns
where libname = 'XXXX' and memname = 'YYYY'
and (upcase(name) LIKE '%ED') or index(upcase(name), 'C') = 3);
end;

data new;
set old (keep = &keeplst);
...
run;
Peter_C
Rhodochrosite | Level 12
name patterns can be tested from a list of the column names. That can be obtained from the OUT= data set from proc contents, from a transpose of all columns of the dataset(but with obs=0), or from an sql query on sashelp.vcolumn (remembering to reduce the rows considered with a where clause on libname and memname). The first of these 3 would be my general recommendation
proc contents data=your.dataset noprint out= your_dataset_columns; run;
That can be filtered on "front" or "back" of the name columns and the selected list stored in a macro variable with fairly clear sql code, like:[pre]%let ed_names = !none! ; * a default designed to look impossible ;
proc sql noprint ;
select name into :ed_names separated by ' '
from your_dataset_columns
where lowcase(substr( name, length(name)-1, 2)) ='ed'
;
%put &sqlobs found, named &ed_names ;
quit ;[/pre]As column names might be mixed case, lowcase(). The length() function indicates the last non-blank character of the column name, so to extract the columns which end in the substr "ed" (in either case) I offered that where clause.

If you follow up these features in the online doc, you'll be ready to do most of these types of "name-feature" selections.
More advanced selection is within the domain of Perl regular expression filtering, which you can find by searching the doc for "prxparse".

hope this helps.

peterC
data_null__
Jade | Level 19
From the description of the your data my impression is that you are using the wrong data structure. Plus you want to do operations on the meta data that are more suited, that is more easily done, on data. I would transpose so that I had something like.

[pre]
id page word count
NYT 1 of 68
NYT 1 the 77
[/pre]

It would be helpful to me to see and example of the data you are working with.
Peter_C
Rhodochrosite | Level 12
as often as variable order and name clarifies anything, when in "reduce the columns mode", I find often find stats on variable values more useful. MIN and MAX being integers indicate columns which are probably integer in their origination (and finding integers in the percentile stats generally confirms the point for me). A small-ish number of distinct numeric values is indicative of categorical data. And obviously, the columns which are constant and/or empty offer scope for "column reduction".
It is in these areas I would look for the (sometimes simple) guiding rules for removing (or changing) columns.

peterC

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 4188 views
  • 0 likes
  • 6 in conversation