BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ANKH1
Pyrite | Level 9

The following datasets are two examples. The proc sql determines whether there is a value or nor under each variable starting with time_. My problem is that we have lots of datasets to evaluate and 20 variables as time_.

data ds1;
input ID	pair	type	time_3	time_6	time_9	time_18;
datalines;
1	1	a	111	134	. 	.		
2	1	b	110	.	123	.	
3	2	a	.	131	.	120
4	2	b	.      .	.	123;
run;


data ds2;
input ID	pair	type	time_3	time_9	time_15;
datalines;
1	1	a	111	134	. 		
2	1	b	110	.	123	
3	2	a	.	131	.
4	2	b	125     .	.;
run;

proc sql;
create table counts as
select ID, pair, type,
not missing (time_3) as time_3,
not missing (time_6) as time_6,
not missing (time_9) as time_9,
not missing (time_18) as time_18
from ds1;
quit;

The way proc sql is now coded, you can only include the variables available in the dataset of interest (in the above proc sql is for ds1). Is there a way to include a condition that will only run the statement "not missing..." if the variable exists? This way we can include all possible variables starting with time_ at once. Like the following (which include time_12 and time_15 not available in ds1):

proc sql;
create table counts as
select ID, pair, type,
not missing (time_3) as time_3,
not missing (time_6) as time_6,
not missing (time_9) as time_9,
not missing (time_12) as time_12,
not missing (time_15) as time_15,
not missing (time_18) as time_18
from ds1;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So if you know the data has two observations per pair then you can probably get it done by adding a PROC TRANSPOSE step.

data ds1;
  input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134   .   .  
2 1 b 110   . 123   . 
3 2 a   . 131   . 120
4 2 b   .   .   . 123
;

proc transpose data=ds1 out=ds1T name=time prefix=type_;
  by pair ;
  id type;
  var time_: ;
run;

Now you have data like this:

Obs    pair     time      type_a    type_b

 1       1     time_3       111       110
 2       1     time_6       134         .
 3       1     time_9         .       123
 4       1     time_18        .         .
 5       2     time_3         .         .
 6       2     time_6       131         .
 7       2     time_9         .         .
 8       2     time_18      120       123

Which you can turn into binary 1/0 values (without the need to do an code generation) and the add up.

data for_count;
  set ds1T ;
  type_A = n(type_a);
  type_B = n(type_b);
  type_pair = type_a and type_b;
run;
proc summary data=for_count nway;
  class time ;
  var type_: ;
  output out=want(drop=_type_) sum=;
run;

Result

                                                type_
Obs     time      _FREQ_    type_a    type_b     pair

 1     time_18       2         1         1        1
 2     time_3        2         1         1        1
 3     time_6        2         2         0        0
 4     time_9        2         0         1        0

 

View solution in original post

10 REPLIES 10
Kathryn_SAS
SAS Employee

You could use macro logic to get the list of variables in advance.

data ds1;
input ID	pair	type $	time_3	time_6	time_9	time_18;
datalines;
1	1	a	111	134	. 	.		
2	1	b	110	.	123	.	
3	2	a	.	131	.	120
4	2	b	.      .	.	123
;
run;


data ds2;
input ID	pair	type $	time_3	time_9	time_15;
datalines;
1	1	a	111	134	. 		
2	1	b	110	.	123	
3	2	a	.	131	.
4	2	b	125     .	.
;
run;

%macro check(dsn);
proc sql noprint;
select name into :varlist separated by ' '
from dictionary.columns 
where libname="WORK" and memname="%upcase(&dsn)"
 and upcase(name) like "TIME_%";
quit;
%put &varlist;
%let num=%sysfunc(countw(&varlist,%str( )));
%put #

proc sql;
create table counts as
select ID, pair, type,
%do i=1 %to #
 %if &i ne &num %then %do;
not missing (%scan(&varlist,&i,%str( )) ) as %scan(&varlist,&i,%str( )),
%end;
%else %do;
not missing (%scan(&varlist,&i,%str( )) ) as %scan(&varlist,&i,%str( ))
%end;
%end;
from &dsn;
quit;
%mend;

options mprint mlogic symbolgen;
%check(ds2)
proc print data=counts;
run;
ballardw
Super User

I think that you need to show what you expect the output to look like.

Perhaps you want a data step and array processing as you can write code that doesn't know all of the variables.

The : builds a list of variables in a data step of all variables that start with that text. So you can use time_: to indicate that all of those variables with the variable names get used in the array to process. Lists and arrays however are not available in SQL.

data ds1;
input ID	pair	type $	time_3	time_6	time_9	time_18;
datalines;
1	1	a	111	134	. 	.		
2	1	b	110	.	123	.	
3	2	a	.	131	.	120
4	2	b	.      .	.	123
;
run;

data want;
   set ds1;
   array t (*) time_: ;
   do i=1 to dim(t);
      t[i]= not missing(t[i]);
   end;
   drop i;
run;

Note: you should test your data step before posting. The variable Type should be character, otherwise you get all missing values due to type error. Also, the line that has a semicolon ends the datalines block. So your step doesn't have the values from the last line of data.

 

PaigeMiller
Diamond | Level 26

This could be a lot of typing depending on your real situation, but it works. It uses the %EXPANDVARLIST macro at https://support.sas.com/resources/papers/proceedings13/032-2013.pdf

 

%let varnames=%expandvarlist(data=sashelp.class);
%put &=varnames;

proc sql;
    create table def as select
        name
        ,sex
        ,age
        %if %sysfunc(findw(%upcase(&varnames),HEIGHT))>0 %then %do; ,height %end;
        %if %sysfunc(findw(%upcase(&varnames),CAULIFLOWER))>0 %then %do; ,cauliflower %end;
    from sashelp.class;
quit;
--
Paige Miller
ANKH1
Pyrite | Level 9

Sorry, I thought it would be easier to ask my questions in part, but you are right. It is best to show what final output I am looking for.

Having the sample dataset in the form of table:

 

IDpairtypetime_3time_6time_9time_12time_18
11a111134...
21b110.123..
32a.131.102120
42b...121123

 

We need a table with columns (Type_a and Type_b) that adds by type per time_ how many cells with not missing values they are. Second, we need a column that shows per pair, how many pairs have values for each time_. This should be the output in a table:

timeType_aType_bpair
time_3111
time_62..
time_9.1.
time_12111
time_18111
Tom
Super User Tom
Super User

I cannot see how you got the second dataset from the first one.  What is PAIR counting in the second dataset? It does not seem to have anything to do with the PAIR in the first dataset.

 

It really looks like your data should be organized as PAIR, TYPE, TIME, VALUE instead.  Then the dataset structure is fixed no matter how many times or types there are.

 

Or possibly if the TYPE is always A or B and you want the A and B rows to be PAIRED then perhaps you have PAIR, TIME and VALUE_A and VALUE_B  instead?

 

 

ANKH1
Pyrite | Level 9

I apologize. Each ID belongs to a unique pair (this dataset only shows 1 and 2, but the number of pairs is greater in the larger dataset) and a type (either a or b). Per each pair, you have one type=a and one type=b.

First, we want to count how many rows we have per type (a and b) with values. Hence, columns Type_a and Type_b.

Column pair, derives from counting how many pairs (each pair composed of two IDS) BOTH IDs have values per time_.

Tom
Super User Tom
Super User

So if you know the data has two observations per pair then you can probably get it done by adding a PROC TRANSPOSE step.

data ds1;
  input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134   .   .  
2 1 b 110   . 123   . 
3 2 a   . 131   . 120
4 2 b   .   .   . 123
;

proc transpose data=ds1 out=ds1T name=time prefix=type_;
  by pair ;
  id type;
  var time_: ;
run;

Now you have data like this:

Obs    pair     time      type_a    type_b

 1       1     time_3       111       110
 2       1     time_6       134         .
 3       1     time_9         .       123
 4       1     time_18        .         .
 5       2     time_3         .         .
 6       2     time_6       131         .
 7       2     time_9         .         .
 8       2     time_18      120       123

Which you can turn into binary 1/0 values (without the need to do an code generation) and the add up.

data for_count;
  set ds1T ;
  type_A = n(type_a);
  type_B = n(type_b);
  type_pair = type_a and type_b;
run;
proc summary data=for_count nway;
  class time ;
  var type_: ;
  output out=want(drop=_type_) sum=;
run;

Result

                                                type_
Obs     time      _FREQ_    type_a    type_b     pair

 1     time_18       2         1         1        1
 2     time_3        2         1         1        1
 3     time_6        2         2         0        0
 4     time_9        2         0         1        0

 

ANKH1
Pyrite | Level 9
Thank you! This works perfect!
Tom
Super User Tom
Super User

Why would you use SQL to do it?

Normal SAS code is usually much easier to write.

data ds1;
  input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134   .   .  
2 1 b 110   . 123   . 
3 2 a   . 131   . 120
4 2 b   .   .   . 123
;

data counts;
  set ds1;
  array t time_:;
  do over t;
     t=not missing(t);
  end;
run;

Result

Obs    ID    pair    type    time_3    time_6    time_9    time_18

 1      1      1      a         1         1         0         0
 2      2      1      b         1         0         1         0
 3      3      2      a         0         1         0         1
 4      4      2      b         0         0         0         1

What is it you are actually trying to do here?  That doesn't look much like "COUNTS".

 

If you just want to bin the values in missing or non-missing why not use a format?

proc format ;
  value miss low-high='Missing' other='Non-Missing';
run;
proc freq data=ds1;
  tables time: / missing;
  format time: miss.;
run;

Tom_0-1687375100061.png

 

Ksharp
Super User
/*Just for having some fun*/
data ds1;
  input ID pair type $ time_3 time_6 time_9 time_18;
datalines;
1 1 a 111 134   .   .  
2 1 b 110   . 123   . 
3 2 a   . 131   . 120
4 2 b   .   .   . 123
;
proc sql;
create table temp as
select type,
       n(time_3) as time_3,
	   n(time_6) as time_6,
	   n(time_9) as time_9,
	   n(time_18) as time_18
  from ds1
   group by type;

create table temp2 as
select * from temp
union
select 'pair',
       sum(time_3=0)=0,
       sum(time_6=0)=0,
       sum(time_9=0)=0,
       sum(time_18=0)=0
  from temp;
quit;
proc transpose data=temp2 out=want;
id type;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1753 views
  • 3 likes
  • 6 in conversation