The following datasets are two examples. The proc sql determines whether there is a value or nor under each variable starting with time_. My problem is that we have lots of datasets to evaluate and 20 variables as time_.
data ds1;
input ID pair type time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134 . .
2 1 b 110 . 123 .
3 2 a . 131 . 120
4 2 b . . . 123;
run;
data ds2;
input ID pair type time_3 time_9 time_15;
datalines;
1 1 a 111 134 .
2 1 b 110 . 123
3 2 a . 131 .
4 2 b 125 . .;
run;
proc sql;
create table counts as
select ID, pair, type,
not missing (time_3) as time_3,
not missing (time_6) as time_6,
not missing (time_9) as time_9,
not missing (time_18) as time_18
from ds1;
quit;
The way proc sql is now coded, you can only include the variables available in the dataset of interest (in the above proc sql is for ds1). Is there a way to include a condition that will only run the statement "not missing..." if the variable exists? This way we can include all possible variables starting with time_ at once. Like the following (which include time_12 and time_15 not available in ds1):
proc sql;
create table counts as
select ID, pair, type,
not missing (time_3) as time_3,
not missing (time_6) as time_6,
not missing (time_9) as time_9,
not missing (time_12) as time_12,
not missing (time_15) as time_15,
not missing (time_18) as time_18
from ds1;
quit;
So if you know the data has two observations per pair then you can probably get it done by adding a PROC TRANSPOSE step.
data ds1;
input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134 . .
2 1 b 110 . 123 .
3 2 a . 131 . 120
4 2 b . . . 123
;
proc transpose data=ds1 out=ds1T name=time prefix=type_;
by pair ;
id type;
var time_: ;
run;
Now you have data like this:
Obs pair time type_a type_b 1 1 time_3 111 110 2 1 time_6 134 . 3 1 time_9 . 123 4 1 time_18 . . 5 2 time_3 . . 6 2 time_6 131 . 7 2 time_9 . . 8 2 time_18 120 123
Which you can turn into binary 1/0 values (without the need to do an code generation) and the add up.
data for_count;
set ds1T ;
type_A = n(type_a);
type_B = n(type_b);
type_pair = type_a and type_b;
run;
proc summary data=for_count nway;
class time ;
var type_: ;
output out=want(drop=_type_) sum=;
run;
Result
type_ Obs time _FREQ_ type_a type_b pair 1 time_18 2 1 1 1 2 time_3 2 1 1 1 3 time_6 2 2 0 0 4 time_9 2 0 1 0
You could use macro logic to get the list of variables in advance.
data ds1;
input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134 . .
2 1 b 110 . 123 .
3 2 a . 131 . 120
4 2 b . . . 123
;
run;
data ds2;
input ID pair type $ time_3 time_9 time_15;
datalines;
1 1 a 111 134 .
2 1 b 110 . 123
3 2 a . 131 .
4 2 b 125 . .
;
run;
%macro check(dsn);
proc sql noprint;
select name into :varlist separated by ' '
from dictionary.columns
where libname="WORK" and memname="%upcase(&dsn)"
and upcase(name) like "TIME_%";
quit;
%put &varlist;
%let num=%sysfunc(countw(&varlist,%str( )));
%put #
proc sql;
create table counts as
select ID, pair, type,
%do i=1 %to #
%if &i ne &num %then %do;
not missing (%scan(&varlist,&i,%str( )) ) as %scan(&varlist,&i,%str( )),
%end;
%else %do;
not missing (%scan(&varlist,&i,%str( )) ) as %scan(&varlist,&i,%str( ))
%end;
%end;
from &dsn;
quit;
%mend;
options mprint mlogic symbolgen;
%check(ds2)
proc print data=counts;
run;
I think that you need to show what you expect the output to look like.
Perhaps you want a data step and array processing as you can write code that doesn't know all of the variables.
The : builds a list of variables in a data step of all variables that start with that text. So you can use time_: to indicate that all of those variables with the variable names get used in the array to process. Lists and arrays however are not available in SQL.
data ds1; input ID pair type $ time_3 time_6 time_9 time_18; datalines; 1 1 a 111 134 . . 2 1 b 110 . 123 . 3 2 a . 131 . 120 4 2 b . . . 123 ; run; data want; set ds1; array t (*) time_: ; do i=1 to dim(t); t[i]= not missing(t[i]); end; drop i; run;
Note: you should test your data step before posting. The variable Type should be character, otherwise you get all missing values due to type error. Also, the line that has a semicolon ends the datalines block. So your step doesn't have the values from the last line of data.
This could be a lot of typing depending on your real situation, but it works. It uses the %EXPANDVARLIST macro at https://support.sas.com/resources/papers/proceedings13/032-2013.pdf
%let varnames=%expandvarlist(data=sashelp.class);
%put &=varnames;
proc sql;
create table def as select
name
,sex
,age
%if %sysfunc(findw(%upcase(&varnames),HEIGHT))>0 %then %do; ,height %end;
%if %sysfunc(findw(%upcase(&varnames),CAULIFLOWER))>0 %then %do; ,cauliflower %end;
from sashelp.class;
quit;
Sorry, I thought it would be easier to ask my questions in part, but you are right. It is best to show what final output I am looking for.
Having the sample dataset in the form of table:
ID | pair | type | time_3 | time_6 | time_9 | time_12 | time_18 |
1 | 1 | a | 111 | 134 | . | . | . |
2 | 1 | b | 110 | . | 123 | . | . |
3 | 2 | a | . | 131 | . | 102 | 120 |
4 | 2 | b | . | . | . | 121 | 123 |
We need a table with columns (Type_a and Type_b) that adds by type per time_ how many cells with not missing values they are. Second, we need a column that shows per pair, how many pairs have values for each time_. This should be the output in a table:
time | Type_a | Type_b | pair |
time_3 | 1 | 1 | 1 |
time_6 | 2 | . | . |
time_9 | . | 1 | . |
time_12 | 1 | 1 | 1 |
time_18 | 1 | 1 | 1 |
I cannot see how you got the second dataset from the first one. What is PAIR counting in the second dataset? It does not seem to have anything to do with the PAIR in the first dataset.
It really looks like your data should be organized as PAIR, TYPE, TIME, VALUE instead. Then the dataset structure is fixed no matter how many times or types there are.
Or possibly if the TYPE is always A or B and you want the A and B rows to be PAIRED then perhaps you have PAIR, TIME and VALUE_A and VALUE_B instead?
I apologize. Each ID belongs to a unique pair (this dataset only shows 1 and 2, but the number of pairs is greater in the larger dataset) and a type (either a or b). Per each pair, you have one type=a and one type=b.
First, we want to count how many rows we have per type (a and b) with values. Hence, columns Type_a and Type_b.
Column pair, derives from counting how many pairs (each pair composed of two IDS) BOTH IDs have values per time_.
So if you know the data has two observations per pair then you can probably get it done by adding a PROC TRANSPOSE step.
data ds1;
input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134 . .
2 1 b 110 . 123 .
3 2 a . 131 . 120
4 2 b . . . 123
;
proc transpose data=ds1 out=ds1T name=time prefix=type_;
by pair ;
id type;
var time_: ;
run;
Now you have data like this:
Obs pair time type_a type_b 1 1 time_3 111 110 2 1 time_6 134 . 3 1 time_9 . 123 4 1 time_18 . . 5 2 time_3 . . 6 2 time_6 131 . 7 2 time_9 . . 8 2 time_18 120 123
Which you can turn into binary 1/0 values (without the need to do an code generation) and the add up.
data for_count;
set ds1T ;
type_A = n(type_a);
type_B = n(type_b);
type_pair = type_a and type_b;
run;
proc summary data=for_count nway;
class time ;
var type_: ;
output out=want(drop=_type_) sum=;
run;
Result
type_ Obs time _FREQ_ type_a type_b pair 1 time_18 2 1 1 1 2 time_3 2 1 1 1 3 time_6 2 2 0 0 4 time_9 2 0 1 0
Why would you use SQL to do it?
Normal SAS code is usually much easier to write.
data ds1;
input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134 . .
2 1 b 110 . 123 .
3 2 a . 131 . 120
4 2 b . . . 123
;
data counts;
set ds1;
array t time_:;
do over t;
t=not missing(t);
end;
run;
Result
Obs ID pair type time_3 time_6 time_9 time_18 1 1 1 a 1 1 0 0 2 2 1 b 1 0 1 0 3 3 2 a 0 1 0 1 4 4 2 b 0 0 0 1
What is it you are actually trying to do here? That doesn't look much like "COUNTS".
If you just want to bin the values in missing or non-missing why not use a format?
proc format ;
value miss low-high='Missing' other='Non-Missing';
run;
proc freq data=ds1;
tables time: / missing;
format time: miss.;
run;
/*Just for having some fun*/
data ds1;
input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134 . .
2 1 b 110 . 123 .
3 2 a . 131 . 120
4 2 b . . . 123
;
proc sql;
create table temp as
select type,
n(time_3) as time_3,
n(time_6) as time_6,
n(time_9) as time_9,
n(time_18) as time_18
from ds1
group by type;
create table temp2 as
select * from temp
union
select 'pair',
sum(time_3=0)=0,
sum(time_6=0)=0,
sum(time_9=0)=0,
sum(time_18=0)=0
from temp;
quit;
proc transpose data=temp2 out=want;
id type;
run;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.