- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all:
I need to check some character variables (ID, Reason are not included) to see if they are ALL missing. is there a better way to make the coding shorter and dynamic(because the a_xx , b_xx might change)
thank you,
purple
if (compress(al01_raw)="" and compress(al02_raw)="" and compress(al03_raw)=""
and compress(al04_raw)="" and compress(al05_raw)="" and compress(al06_raw)="" and compress(al07_raw)=""
and compress(al08_raw)="" and
compress(b01_raw)="" and compress(b02_raw)="" and compress(b03_raw)="" and compress(b04_raw)=""
and compress(b05_raw)="" and compress(b06_raw)="" and compress(b07_raw)="" and compress(b08_raw)=""
and compress(b09_raw)="" and compress(b10_raw)="" and compress(b11_raw)="" and compress(b12_raw)=""
and compress(b13_raw)="" and compress(b14_raw)="" and compress(b15_raw)="" and compress(b16_raw)=""
and compress(b17_raw)="" and compress(b18_raw)="" and compress(b19_raw)="" and compress(b20_raw)="")
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Apart from the suggestions to rename your variables, so that it is easier to refer to them as e.g. "a_raw101-a_raw", you may also be so lucky that the variables are contiguous in the data set vector (no other variables in between), in which case you can refer to them as e.g "a101_raw--b20_raw".
Or if there are other variables in between, but these are numeric, you can refer to your text variables as "a101_raw-character-b20_raw".
Apart from that, there is no need to define an array, just do
if compress(cats(of a101_raw--b20_raw))=""
- I left in the COMPRESS call as it may strip away some special characters that CATS will not strip out. Not quite sure if it is necessary.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
First lose the COMPRESS, that will save you a bunch of letters. Since you have not specified any characters to be removed it is basically doing nothing but wasting clock cycles.
The CMISS function will return the number of variables in the parameter list that have missing values.
If you assign all of those variables to an array the DIM function returns the number of items in the array.
So if cmiss = dim then all of the values are missing.
array v(*) al01_raw al02_raw al03_raw ; /* use your list I'm too lazy to type that*/ If cmiss(of v(*)) = dim(v) then <whatever>;
Note: If you had named your variables Raw_al01 Raw_al02 etc you could use a nice list to reference Raw_al01 - Raw_al08 instead of typing all of the names, similar with Raw_b01 - Raw_b20 if you want all of the values in the sequence indicated.
Which would REALLY make this much easier to be "dynamic". Consider if you have
array v(*) Raw_al01 - Raw_al08
Raw_b01 - Raw_b20
; If cmiss(of v(*)) = dim(v) then <whatever>;
You can see where this might be easy to change. Such as you needed al15, just change the 08 to 15. Caution: You really want to use multiple array definitions if different lists are to processed in the same data step.
If you actually wanted all of the "alXX" variable you could then use Raw_al: in the array definition and all variables whose names start with Raw_al would be used.
You will find that many SAS programmers will assign group prefixes instead of suffixes because the lists are so handy and reduce typing. These lists can be used in Drop and Keep statements or data set options and in most functions that take multiple variables though you may have to use the "of " to tell SAS you intend a list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can just testing if all of those variables are empty.
So you could just do:
if ' '= cats( of
al01_raw al02_raw al03_raw al04_raw al05_raw al06_raw al07_raw al08_raw
b01_raw b02_raw b03_raw b04_raw b05_raw b06_raw b07_raw b08_raw
b09_raw b10_raw b11_raw b12_raw b13_raw b14_raw b15_raw b16_raw
b17_raw b18_raw b19_raw b20_raw
);
And you changed how the variables are named so that the numeric suffix is an actual suffix then it would be even easier.
if ' '= cats( of al_raw01- al_raw08 b_raw01 - b_raw20 );
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Apart from the suggestions to rename your variables, so that it is easier to refer to them as e.g. "a_raw101-a_raw", you may also be so lucky that the variables are contiguous in the data set vector (no other variables in between), in which case you can refer to them as e.g "a101_raw--b20_raw".
Or if there are other variables in between, but these are numeric, you can refer to your text variables as "a101_raw-character-b20_raw".
Apart from that, there is no need to define an array, just do
if compress(cats(of a101_raw--b20_raw))=""
- I left in the COMPRESS call as it may strip away some special characters that CATS will not strip out. Not quite sure if it is necessary.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @purpleclothlady,
As Tom has shown, the first step to shorten your code is to use syntax around a list of your variable names alxx_raw and bxx_raw.
In a second step you can virtually always abbreviate such lists and also make them more dynamic. While the unfavorable naming convention with sequential numbers in the middle of the names precludes using numbered range lists, you can most likely use one of the other three types of SAS Variable Lists: Depending on the structure of your input dataset,
- name range lists such as al01_raw--b20_raw or al01_raw-char-al08_raw (as has been suggested by s_lassen already)
- name prefix lists like al0:
- special lists such as _char_
may by applicable.
Note that exclusions (you mentioned ID and Reason) can be achieved by gradual creation of the program data vector (PDV).
Example:
data want;
if 0 then set have(drop=ID Reason);
array c[*] _char_;
set have;
/* ... code using cats(of c[*]) ... */
run;
The first SET statement puts all variables from dataset HAVE into the PDV, except ID and Reason. Thus, the list _char_ of all character variables -- which are contained in the PDV so far -- in the array definition does not include those two variables. This list can still be referred to by the of c[*] syntax in suitable functions (such as CATS) even after the second SET statement has added ID and Reason to the PDV.
Another powerful tool for creating non-standard variable lists is a query to dictionary.columns (cf. DICTIONARY Tables).
Example:
proc sql noprint;
select name into :varlist separated by ' '
from dictionary.columns
where libname='WORK' & memname='HAVE' & upcase(name) like 'AL__\_RAW' escape '\';
quit;
%put &varlist;
The PROC SQL step above puts a list of the variables in dataset WORK.HAVE with names of the form alxx_raw (with two arbitrary characters xx) into macro variable VARLIST, so you can use code like cats(of &varlist) later on. The WHERE condition could be adapted to your needs in various ways (e.g., restricted to variables with type='char' or to a particular range of numbers xx or extended to include bxx_raw variables).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ballardw @Tom @FreelanceReinh @s_lassen :
Thanks so much for your help.
all works and I combined the methods , please see Have test code and result. 👍
Purple
/********************************************************************************************************
Porgram name: DATA2DATASTEP.sas
Goal: Turn a SAS DATA set into a DATA step
Reference:
https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/
*******************************************************************************************************/
OPTIONS nomprint nomlogic;
%macro DATA2DATASTEP(dsn,lib,obs);
/*Step1: SQL query that gets a space delimited list of the variable names*/
proc sql noprint;
select Name
into :varlist separated by ' '
from dictionary.columns
where libname="&lib"
and memname="&dsn";
quit;
%put &varlist;
/*Step2: Gets a space delimited list of the variable names concatenated with a ':' and required informat,
which we will use on the DATA step INPUT statement to read the values in from the DATALINES:*/
proc sql noprint;
select case type
when 'num' then
case
when missing(format) then cats(Name,':32.')
else cats(Name,':',format)
end
else cats(Name,':$',length,'.')
end
into :inputlist separated by ' '
from dictionary.columns
where libname="&lib"
and memname="&dsn";
quit;
%put &inputlist;
/*Step3: DATA _NULL_ step to write a DATA step so we can copy and paste data to the Section: RAW DATA */
data _null_;
set &lib..&dsn(obs=&obs) end=last;
put &varlist @;
if last then do;
put;
put ';;;;';
end;
else put;
run;
%mend;
%data2datastep
(HAVE,
WORK,
7);
/*The above code will show input var name and value on LOG, copy and paste and review INPUT and datalines statement*/
DATA HAVE;
infile datalines dlm='' missover /*end missing*/ dsd /*beginning and middle*/;
input id $ a01_raw $ a02_raw $ a03_raw $ a04_raw $ a05_raw $ a06_raw $
a07_raw $ a08_raw $ b01_raw $ b02_raw $b03_raw $ b04_raw $
b05_raw $ b06_raw $ b07_raw $ b08_raw $ b09_raw $ b10_RAW $
b11_raw $ b12_raw $ b13_raw $ b14_raw $ b15_raw $ b16_raw $
b17_raw $ b18_raw $ b19_raw $ b20_raw $;
datalines;
102-001 1
208-001 1
205-001
206-002
220-001 1 1 1 1 1 1 1
220-002
230-001
;
run;
/
/*-----------------------------METHOD1-------------------------*/
/*Step1. Macro &varlist */
proc sql noprint;
select name into :varlist separated by ' '
from dictionary.columns
where libname='WORK' & memname='HAVE' & upcase(name) contains '_RAW';
quit;
%put &varlist;
/*Step2.Check if the list had all missing values*/
data have2;
set have;
if compress(cats(of a01_raw--b20_raw))^="" then count2="non-blank" ;
if compress(cats(of &varlist))="" then count="blank";
run;
/*-----------------------------METHOD2-------------------------*/
/*This will only output any ID has all missing values*/
data have_onlymissing;
set have;
if ' '= cats( of a01_raw--a08_raw b01_raw--b20_raw);
run;
/*-----------------------------METHOD3-------------------------*/
/*Count number of missing values by ID BY ARRAY OR &VARLIST*/
data have_count;
retain _: ;
set have;
array v(*) a01_raw--a08_raw b01_raw--b20_raw;
_count_missing= cmiss(of v(*));
/*OR*/
_count_missing2= cmiss(of &varlist);
_tot=dim(v); _count_nonmiss=dim(v)-_count_missing;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The COMPRESS() function call is still not doing anything. If you do not provide any characters for compress() to remove then it only removes the blanks. But there is no need to remove the blanks to test if the string is blank or not. SAS string comparisons already ignore trailing blanks.