BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sabataged
Obsidian | Level 7

Hello everyone and thank you in advance for your help!  This is my first post, so please forgive me if this is confusing or redundant. 

 

Each month, I receive a large dataset with around 4000+ variables and 3000+ observations.  The data is from 30 different centers, but I am only interested in the data from 3 centers. Some variables are the same for all centers, but variables are only linked to some of the centers.  I would like to get rid of the variables that have no values for the centers that I am interested in. I currently have a code that works, but I have to split my file into 6 lists of variables. I feel like there must be a way to do this in one step without getting the messae "ERROR: The text expression length (65639) exceeds maximum length (65534). The text expression has been truncated to 65534" because I'm SURE people work with even larger datasets than mine! 

 

 I have this current setup: 

 

/* To split up my dataset, I am using this code 
to find the names of the variables so I 
can create a list. The SAS Macro is able to
 handle only 800 variable names at a time. 
Otherwise I receive an error about macro 
variable size and truncation*/ 

proc contents data= &month noprint out=_contents_ order=varnum;
run;

proc sort data=_contents_;
by varnum; 
run; quit; 

proc print data=_contents_ ;
where varnum in (1, 2, 800, 801, 1600, 1601, 2400, 2401, 3200, 3201, 4000, 4386);
var name varnum;
run;

/*I change the LET statement to reflect the part of the list I am at */ 

%LET listnum0=CallReportNum ReportVersion--Zip_Code_Information___Did_you_a; 
%LET set1= iteration1; %LET set0= &month; 

/*this is the main part of the code that 
gets rid of the variables with no observations */

proc transpose data= &set0(obs=0) out= vname ;
 var &listnum0;
run;

proc contents data= vname order= varnum; 
run; quit; 

proc sql;
 select catx(' ','n(',_name_,') as ',_name_) into : list separated by ',' from vname;
 create table temp as
  select &list from &set0
;quit;

proc transpose data= temp out= drop ;
 var _all_;
run;

proc sql;
 select _name_ into : drop separated by ' ' from drop where col1= 0;
quit;

data &set1; 
 set &set0 (drop= &drop);
run; 

Is there a simple fix or creative workaround to this issue that I am unaware of?

 

Thanks again! 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The limitation of that approach is the requirement to use macro variables.  So don't use macro variables.  It is probably easier to just generate code a text file instead.

* Get list of variables ;
proc transpose data=&month (obs=0) out=names ;
   var _all_;
run;

* Generate code to count non-missing values ;
filename code temp;
data _null_;
   file code ;
   set names end=eof;
   if _n_=1 then put
 'proc sql noprint;'
/'create table _counts as select' 
/' ' @
  ; else put ',' @ ;
  put 'sum(not missing(' _name_ ')) as ' _name_ ;
  if eof then put
 'from &month'
/';'
/'quit;'
  ;
run;

* Run generated code ;
%include code / source2 ;

* Generate DROP statement ;
filename code temp;
data _null_;
  set _counts ;
  array c _numeric_;
  file code lrecl=80 ;
  length _name_ $32 ;  
  put 'drop ' @;
  do _n_=1 to dim(c);
   if c(_n_)=0 then do ;
     _name_ = vname(c(_n_));
     put _name_ @ ;
  end;
  end;
  put ';' ;
run;

* Make version of data without empty variables ;
data &set1 ;
  set &month ;
  %include code / source2;
run;

 

View solution in original post

19 REPLIES 19
MarkWik
Quartz | Level 8

@sabotaged Welcome to SAS forum. Would be nice to follow if you could state your objective/requirement before your code like

My objective is to ...............

The following code does it............

The problem I am encountering....

Any solutions plz....................

sabataged
Obsidian | Level 7
Hi, thank you for your reply! I'll try to edit my post to make it clearer.
Reeza
Super User

Are the missing variables likely to be character or numeric?

 

This is actually asked pretty frequently, so a search generates several ready made solutions:

 

User group paper:

http://support.sas.com/resources/papers/proceedings10/048-2010.pdf

 

Previous question here:

https://communities.sas.com/t5/SAS-Procedures/Getting-rid-of-all-variables-for-which-there-are-ONLY-...

 

SAS Note:

http://support.sas.com/kb/24/622.html

sabataged
Obsidian | Level 7
Hi, thanks so much for your reply. To answer your first question, there are both character and numeric variables, and there are a fair amount of both types dispersed throughout the document. As for the links you referenced, thank you so much, but I actually already looked through these (and many other) solutions before posting. I tried to use the %dropmiss macro last week, but was unable to successfully use it. After searching a lot of forum Q&A, I found a version of the main code that I posted in my original question. It works, but I have to split my dataset into 6 lists and run the code until I get through all the variables. I'm just wondering if there's any way to get around the message "ERROR: The text expression length (65639) exceeds maximum length (65534). The text expression has been truncated to 65534". I also researched about this error, but wasn't able to find a true workaround. I tried to change the size of the macro variable, but that didn't work either. Thanks again for your help! I apologize if I'm missing something that is otherwise obvious!
Tom
Super User Tom
Super User

The limitation of that approach is the requirement to use macro variables.  So don't use macro variables.  It is probably easier to just generate code a text file instead.

* Get list of variables ;
proc transpose data=&month (obs=0) out=names ;
   var _all_;
run;

* Generate code to count non-missing values ;
filename code temp;
data _null_;
   file code ;
   set names end=eof;
   if _n_=1 then put
 'proc sql noprint;'
/'create table _counts as select' 
/' ' @
  ; else put ',' @ ;
  put 'sum(not missing(' _name_ ')) as ' _name_ ;
  if eof then put
 'from &month'
/';'
/'quit;'
  ;
run;

* Run generated code ;
%include code / source2 ;

* Generate DROP statement ;
filename code temp;
data _null_;
  set _counts ;
  array c _numeric_;
  file code lrecl=80 ;
  length _name_ $32 ;  
  put 'drop ' @;
  do _n_=1 to dim(c);
   if c(_n_)=0 then do ;
     _name_ = vname(c(_n_));
     put _name_ @ ;
  end;
  end;
  put ';' ;
run;

* Make version of data without empty variables ;
data &set1 ;
  set &month ;
  %include code / source2;
run;

 

sabataged
Obsidian | Level 7

Tom, you are a genius! Thank you SO much for your help!  This is exactly what I needed.  Sending you many, many positive vibes. 🙂 

Tom
Super User Tom
Super User

Observation is the word used for one record (or row) in a SAS dataset. So every variable in a dataset has the same number of observations.

 

Do you mean that you want find out which variables have only missing values? 

Do you have any numeric variables? If so does it matter if the variable has some observations with special missing values, like .A or .Z?

 

Also why do you want eliminate them?

 

sabataged
Obsidian | Level 7
Hi Tom, thanks for the reply. To answer your questions: 1. Yes, I would like to delete variables with only missing values. Sorry for the misuse of terminology! I will edit my post to reflect the correct term. 2. I have both numeric and character variables in my dataset. I'm just interested in getting rid of variables with all missing values. I don't have any special missing values, I think. 3. Each month, I receive a dataset that contains variables and records from 30 different centers. There are both character and numeric variables. I am only interested in the variables from 3 centers, so I'd like to delete the rest of the variables that are linked to other centers. After going through the process of cleaning the dataset, I end up with around 300 variables from the original 4000+ variables.
Tom
Super User Tom
Super User

Sounds like missing status has nothing to do with the request.  Perhaps I misunderstood your original code.

So you just want to keep variables related to the selected centers?

How do you know which variables are related to which center?

How do you know which centers you want to keep?

 

sabataged
Obsidian | Level 7
Hi Tom, I wish I could upload the original Excel file, but it has patient information in it. There are 4000+ variables, but many of the variables don't have any data in them for any center. The rest of the variables have information in them, but only for certain centers. I don't have any way of telling which variables are related to which center, so that's why it would really help to get rid of variables with no values. I know which centers I want to keep because we are interested in the data from only 3 specific centers.
Reeza
Super User

It does not need to be real data. Make fake data that has the same structure and issues as the real data. Yes this takes work but you'll get an answer much faster so in the long run you save time. 

Tom
Super User Tom
Super User
Also just make data with a few columns. We don't need 4000 columns to generate test code.
sabataged
Obsidian | Level 7
Hi Reeza, I will try and do that next time for easier visualization. Thank you for the tip!
Tom
Super User Tom
Super User

Do you have a CENTER variable you can use to find the centers you want to keep?

If so you could just add a first step to subset the observations (rows) to just the centers of interest.

data step1 ;
  set &month ;
  where center in (......);
run;

Then use that dataset as the input to the code I posted and it will eliminate the columns that are all empty for just that subset of the rows.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 7500 views
  • 4 likes
  • 5 in conversation