Solved: Include a condition in proc sql to evaluate if the variable exists

ANKH1 · Posted 06-21-2023 12:28 PM

The following datasets are two examples. The proc sql determines whether there is a value or nor under each variable starting with time_. My problem is that we have lots of datasets to evaluate and 20 variables as time_.

data ds1;
input ID	pair	type	time_3	time_6	time_9	time_18;
datalines;
1	1	a	111	134	. 	.		
2	1	b	110	.	123	.	
3	2	a	.	131	.	120
4	2	b	.      .	.	123;
run;


data ds2;
input ID	pair	type	time_3	time_9	time_15;
datalines;
1	1	a	111	134	. 		
2	1	b	110	.	123	
3	2	a	.	131	.
4	2	b	125     .	.;
run;

proc sql;
create table counts as
select ID, pair, type,
not missing (time_3) as time_3,
not missing (time_6) as time_6,
not missing (time_9) as time_9,
not missing (time_18) as time_18
from ds1;
quit;

The way proc sql is now coded, you can only include the variables available in the dataset of interest (in the above proc sql is for ds1). Is there a way to include a condition that will only run the statement "not missing..." if the variable exists? This way we can include all possible variables starting with time_ at once. Like the following (which include time_12 and time_15 not available in ds1):

proc sql;
create table counts as
select ID, pair, type,
not missing (time_3) as time_3,
not missing (time_6) as time_6,
not missing (time_9) as time_9,
not missing (time_12) as time_12,
not missing (time_15) as time_15,
not missing (time_18) as time_18
from ds1;
quit;

Tom · Posted 06-21-2023 04:32 PM

So if you know the data has two observations per pair then you can probably get it done by adding a PROC TRANSPOSE step.

data ds1;
  input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134   .   .  
2 1 b 110   . 123   . 
3 2 a   . 131   . 120
4 2 b   .   .   . 123
;

proc transpose data=ds1 out=ds1T name=time prefix=type_;
  by pair ;
  id type;
  var time_: ;
run;

Now you have data like this:

Obs    pair     time      type_a    type_b

 1       1     time_3       111       110
 2       1     time_6       134         .
 3       1     time_9         .       123
 4       1     time_18        .         .
 5       2     time_3         .         .
 6       2     time_6       131         .
 7       2     time_9         .         .
 8       2     time_18      120       123

Which you can turn into binary 1/0 values (without the need to do an code generation) and the add up.

data for_count;
  set ds1T ;
  type_A = n(type_a);
  type_B = n(type_b);
  type_pair = type_a and type_b;
run;
proc summary data=for_count nway;
  class time ;
  var type_: ;
  output out=want(drop=_type_) sum=;
run;

Result

                                                type_
Obs     time      _FREQ_    type_a    type_b     pair

 1     time_18       2         1         1        1
 2     time_3        2         1         1        1
 3     time_6        2         2         0        0
 4     time_9        2         0         1        0

View solution in original post

Kathryn_SAS · Posted 06-21-2023 12:53 PM

You could use macro logic to get the list of variables in advance.

data ds1;
input ID	pair	type $	time_3	time_6	time_9	time_18;
datalines;
1	1	a	111	134	. 	.		
2	1	b	110	.	123	.	
3	2	a	.	131	.	120
4	2	b	.      .	.	123
;
run;


data ds2;
input ID	pair	type $	time_3	time_9	time_15;
datalines;
1	1	a	111	134	. 		
2	1	b	110	.	123	
3	2	a	.	131	.
4	2	b	125     .	.
;
run;

%macro check(dsn);
proc sql noprint;
select name into :varlist separated by ' '
from dictionary.columns 
where libname="WORK" and memname="%upcase(&dsn)"
 and upcase(name) like "TIME_%";
quit;
%put &varlist;
%let num=%sysfunc(countw(&varlist,%str( )));
%put &num;

proc sql;
create table counts as
select ID, pair, type,
%do i=1 %to &num;
 %if &i ne &num %then %do;
not missing (%scan(&varlist,&i,%str( )) ) as %scan(&varlist,&i,%str( )),
%end;
%else %do;
not missing (%scan(&varlist,&i,%str( )) ) as %scan(&varlist,&i,%str( ))
%end;
%end;
from &dsn;
quit;
%mend;

options mprint mlogic symbolgen;
%check(ds2)
proc print data=counts;
run;

ballardw · Posted 06-21-2023 02:42 PM

I think that you need to show what you expect the output to look like.

Perhaps you want a data step and array processing as you can write code that doesn't know all of the variables.

The : builds a list of variables in a data step of all variables that start with that text. So you can use time_: to indicate that all of those variables with the variable names get used in the array to process. Lists and arrays however are not available in SQL.

data ds1;
input ID	pair	type $	time_3	time_6	time_9	time_18;
datalines;
1	1	a	111	134	. 	.		
2	1	b	110	.	123	.	
3	2	a	.	131	.	120
4	2	b	.      .	.	123
;
run;

data want;
   set ds1;
   array t (*) time_: ;
   do i=1 to dim(t);
      t[i]= not missing(t[i]);
   end;
   drop i;
run;

Note: you should test your data step before posting. The variable Type should be character, otherwise you get all missing values due to type error. Also, the line that has a semicolon ends the datalines block. So your step doesn't have the values from the last line of data.

PaigeMiller · Posted 06-21-2023 03:06 PM

This could be a lot of typing depending on your real situation, but it works. It uses the %EXPANDVARLIST macro at https://support.sas.com/resources/papers/proceedings13/032-2013.pdf

%let varnames=%expandvarlist(data=sashelp.class);
%put &=varnames;

proc sql;
    create table def as select
        name
        ,sex
        ,age
        %if %sysfunc(findw(%upcase(&varnames),HEIGHT))>0 %then %do; ,height %end;
        %if %sysfunc(findw(%upcase(&varnames),CAULIFLOWER))>0 %then %do; ,cauliflower %end;
    from sashelp.class;
quit;

--
Paige Miller

ANKH1 · Posted 06-21-2023 03:17 PM

Sorry, I thought it would be easier to ask my questions in part, but you are right. It is best to show what final output I am looking for.

Having the sample dataset in the form of table:

ID	pair	type	time_3	time_6	time_9	time_12	time_18
1	1	a	111	134	.	.	.
2	1	b	110	.	123	.	.
3	2	a	.	131	.	102	120
4	2	b	.	.	.	121	123

We need a table with columns (Type_a and Type_b) that adds by type per time_ how many cells with not missing values they are. Second, we need a column that shows per pair, how many pairs have values for each time_. This should be the output in a table:

time	Type_a	Type_b	pair
time_3	1	1	1
time_6	2	.	.
time_9	.	1	.
time_12	1	1	1
time_18	1	1	1

Tom · Posted 06-21-2023 03:31 PM

I cannot see how you got the second dataset from the first one. What is PAIR counting in the second dataset? It does not seem to have anything to do with the PAIR in the first dataset.

It really looks like your data should be organized as PAIR, TYPE, TIME, VALUE instead. Then the dataset structure is fixed no matter how many times or types there are.

Or possibly if the TYPE is always A or B and you want the A and B rows to be PAIRED then perhaps you have PAIR, TIME and VALUE_A and VALUE_B instead?

ANKH1 · Posted 06-21-2023 04:00 PM

I apologize. Each ID belongs to a unique pair (this dataset only shows 1 and 2, but the number of pairs is greater in the larger dataset) and a type (either a or b). Per each pair, you have one type=a and one type=b.

First, we want to count how many rows we have per type (a and b) with values. Hence, columns Type_a and Type_b.

Column pair, derives from counting how many pairs (each pair composed of two IDS) BOTH IDs have values per time_.

Tom · Posted 06-21-2023 04:32 PM

So if you know the data has two observations per pair then you can probably get it done by adding a PROC TRANSPOSE step.

data ds1;
  input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134   .   .  
2 1 b 110   . 123   . 
3 2 a   . 131   . 120
4 2 b   .   .   . 123
;

proc transpose data=ds1 out=ds1T name=time prefix=type_;
  by pair ;
  id type;
  var time_: ;
run;

Now you have data like this:

Obs    pair     time      type_a    type_b

 1       1     time_3       111       110
 2       1     time_6       134         .
 3       1     time_9         .       123
 4       1     time_18        .         .
 5       2     time_3         .         .
 6       2     time_6       131         .
 7       2     time_9         .         .
 8       2     time_18      120       123

Which you can turn into binary 1/0 values (without the need to do an code generation) and the add up.

data for_count;
  set ds1T ;
  type_A = n(type_a);
  type_B = n(type_b);
  type_pair = type_a and type_b;
run;
proc summary data=for_count nway;
  class time ;
  var type_: ;
  output out=want(drop=_type_) sum=;
run;

Result

                                                type_
Obs     time      _FREQ_    type_a    type_b     pair

 1     time_18       2         1         1        1
 2     time_3        2         1         1        1
 3     time_6        2         2         0        0
 4     time_9        2         0         1        0

ANKH1 · Posted 06-21-2023 05:33 PM

Thank you! This works perfect!

Tom · Posted 06-21-2023 03:18 PM

Why would you use SQL to do it?

Normal SAS code is usually much easier to write.

data ds1;
  input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134   .   .  
2 1 b 110   . 123   . 
3 2 a   . 131   . 120
4 2 b   .   .   . 123
;

data counts;
  set ds1;
  array t time_:;
  do over t;
     t=not missing(t);
  end;
run;

Result

Obs    ID    pair    type    time_3    time_6    time_9    time_18

 1      1      1      a         1         1         0         0
 2      2      1      b         1         0         1         0
 3      3      2      a         0         1         0         1
 4      4      2      b         0         0         0         1

What is it you are actually trying to do here? That doesn't look much like "COUNTS".

If you just want to bin the values in missing or non-missing why not use a format?

proc format ;
  value miss low-high='Missing' other='Non-Missing';
run;
proc freq data=ds1;
  tables time: / missing;
  format time: miss.;
run;

Ksharp · Posted 06-22-2023 09:36 AM

/*Just for having some fun*/
data ds1;
  input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134   .   .  
2 1 b 110   . 123   . 
3 2 a   . 131   . 120
4 2 b   .   .   . 123
;
proc sql;
create table temp as
select type,
       n(time_3) as time_3,
	   n(time_6) as time_6,
	   n(time_9) as time_9,
	   n(time_18) as time_18
  from ds1
   group by type;

create table temp2 as
select * from temp
union
select 'pair',
       sum(time_3=0)=0,
       sum(time_6=0)=0,
       sum(time_9=0)=0,
       sum(time_18=0)=0
  from temp;
quit;
proc transpose data=temp2 out=want;
id type;
run;

Include a condition in proc sql to evaluate if the variable exists

Re: Include a condition in proc sql to evaluate if the variable exists

Re: Include a condition in proc sql to evaluate if the variable exists

Re: Include a condition in proc sql to evaluate if the variable exists

Re: Include a condition in proc sql to evaluate if the variable exists

Re: Include a condition in proc sql to evaluate if the variable exists

Re: Include a condition in proc sql to evaluate if the variable exists

Re: Include a condition in proc sql to evaluate if the variable exists

Re: Include a condition in proc sql to evaluate if the variable exists

Re: Include a condition in proc sql to evaluate if the variable exists

Re: Include a condition in proc sql to evaluate if the variable exists

Re: Include a condition in proc sql to evaluate if the variable exists

SAS Innovate 2025: Call for Content

Classroom Training Available!