i am having 4 variables f1 f2 f3 f4
data test;
input f1 f2 f3 f4;
cards;
1 2 3 4
4 5 1 2
8 1 9 0
3 4 4 1
9 12 12 14
2 16 17 18
run;
cards;
actually i want to find out the observations that are exesting in f1 f2 f3 and f4 in the above example 1 is repeating in all the four variable
at any time ,and 2 is there in f1,f2 and in f4 i want to find out the variable then are in atleast in 3 variables
output
f1 f2 f3 f4
1 1 1 1
2 2 . 2
Your data structure does not match the way you are referring to the information.
You talk about the values inside individual variables as if they were the observations.
If you restructure the data into two variables, one with the value and one with the column label then the query becomes easier.
data test;
do col='f1','f2','f3','f4';
input value @; output;
end;
cards;
1 2 3 4
4 5 1 2
8 1 9 0
3 4 4 1
9 12 12 14
2 16 17 18
run;
proc sql noprint ;
create table want as
select distinct value
from test
group by value
having count(distinct col) > 2
;
quit;
proc print; run;
i am having 4 variables f1 f2 f3 f4
data test;
input name add f1 f2 f3 f4;
cards;
a b 1 2 3 4
c g 4 5 1 2
y u 8 1 9 0
r p 3 4 4 1
k l 9 12 12 14
m n 2 16 17 18
run;
cards;
actually i want to find out the observations that are exesting in f1 f2 f3 and f4 in the above example 1 is repeating in all the four variable
at any time ,and 2 is there in f1,f2 and in f4 i want to find out the variable then are in atleast in 3 variables.
output
f1 f2 f3 f4
1 1 1 1
2 2 . 2
I still have no idea what you are actually trying to do, but try this method using PROC TRANSPOSE to convert your 4 variables into 4 observations so that we can query over them.
proc transpose data=test out=test2 ;
by name add notsorted ;
var f1-f4 ;
run;
proc sql noprint;
create table test2 as
select distinct col1 as value,_name_,col1
from test2
group by col1
having count(distinct _name_) > 2
order by 1,2
;
quit;
proc transpose data=test2 out=want(drop=_name_) ;
by value;
id _name_;
var col1 ;
run;
proc print;
run;
output:
Obs value f1 f2 f3 f4
1 1 1 1 1 1
2 2 2 2 . 2
3 4 4 4 4 4
But i am having 50 variables beside that f1,f2,f3 and f4 ,i cant use proc transpose and data is in 30 laks
Some questions:
What is the full range of values the variables can have?
Your example mentioned knowing values being shared by 2, 3 and 4 of the variables. Does that now translate to be shared by 2,3,4 ..etc .. through all 50 variables?
Comment: 3 million isn't that big of a data set.
yes
Does that mean that it takes 50 variables to uniquely identify an observation in your source dataset? So that you are worried about the space/time to store 3 million * 4 * (length of 50 variables plus the _name_ and COL1 variables) bytes of data?
From your original request you do not seem to care about those variables so it should be fine not to include them in the transposed data.
You can always use a data step view to construct the transposed data to feed to the PROC SQL code so that you do not need to store that intermediate dataset at all.
data test2 / view=test2 ;
length _name_ $32 col1 8 ;
set test ;
array vars f1-f4 ;
do i=1 to dim(vars);
_name_ = vname(vars(i));
col1 = vars(i);
output;
end;
keep _name_ col1;
run;
Or does it mean that you want to search over 50 variables instead of 4. The difference in space to store 50 variables per observation versus one variable in 50 times as many observations is not much.
I think the biggest problem seems to be generalising.
this seems to provide some flexibility while making only a single pass through the original data
%let your_data = test ;
%let list_values2test = 1 2 ;
%let list_columns2test = f1 f2 f3 f4 ;
%let required = 3 ;
%let num_values = %sysfunc( countw( &list_values2test )) ;
%let num_tests = %sysfunc( countw( &list_columns2test )) ;
data results( keep= &list_columns2test ) ;
***********************************************************************
* array results holds flags indicating required value found in a col *
**********************************************************************;
array results( &num_values, &num_tests ) $1 _temporary_ ;
array inputs ( &num_tests ) &list_columns2test ;
array outputs( &num_tests ) &list_columns2test ;
***********************************************************************
* array values lists the values to be hunted among the columns
**********************************************************************;
array values ( &num_values ) _temporary_ ( &list_values2test ) ;
do row= 1 by 1 while( not end_of_data ) ;
set &your_data end= end_of_data ;
do col = 1 to &num_tests ;
which = whichn( inputs(col), of values(*) ) ;
if which then results( which, col ) = 'x' ;
end ;
end ;
do test = 1 to &num_values;
found =0 ;
value = values( test ) ;
call missing( of outputs(*) );
do col=1 to &num_tests ;
if results( test, col ) ='x' then do ;
found + 1 ;
outputs( col ) = value ;
end ;
end ;
if found >= &required then output ;
end ;
stop ;
run ;
tested and working for this set of data
If performance is an issue, you could stop testing once all flags have been set for a value
How about:
data test; input f1 f2 f3 f4; cards; 1 2 3 4 4 5 1 2 8 1 9 0 3 4 4 1 9 12 12 14 2 16 17 17 ; run; proc sql noprint; select quote(cats('_',name)) into : name separated by ',' from dictionary.columns where libname='WORK' and memname='TEST' and upcase(name) like 'F%' order by varnum; select cats('_',name) into : _name separated by ' ' from dictionary.columns where libname='WORK' and memname='TEST' and upcase(name) like 'F%' order by varnum; quit; %put _user_; data want(keep=_f:); if _n_ eq 1 then do; length k &_name 8; declare hash ha (hashexp: 20); declare hiter hi ('ha'); ha.definekey('k'); ha.definedata(&name); ha.definedone(); end; set test end=last; array ff{*} f: ; array _ff{*} _f: ; do i=1 to dim(ff); k=ff{i]; rc=ha.find(); _ff{i}=ff{i}; ha.replace(); call missing(of _ff{*}); end; if last then do; do while(hi.next()=0); if n(of _:) ge 3 then output; end; end; run;
Ksharp
Message was edited by: xia keshan
Thqs ksharp but how can i do the same for character varaibles
data test; input (f1 f2 f3 f4) ($); cards; 1 2 3 4 4 5 1 2 8 1 9 0 3 4 4 1 9 12 12 14 2 16 17 17 ; run; proc sql noprint; select quote(cats('_',name)) into : name separated by ',' from dictionary.columns where libname='WORK' and memname='TEST' and upcase(name) like 'F%' order by varnum; select cats('_',name) into : _name separated by ' ' from dictionary.columns where libname='WORK' and memname='TEST' and upcase(name) like 'F%' order by varnum; quit; %put _user_; data want(keep=_f:); if _n_ eq 1 then do; length k &_name $ 8; declare hash ha (hashexp: 20); declare hiter hi ('ha'); ha.definekey('k'); ha.definedata(&name); ha.definedone(); end; set test end=last; array ff{*} $ f: ; array _ff{*} $ _f: ; do i=1 to dim(ff); k=ff{i]; rc=ha.find(); _ff{i}=ff{i}; ha.replace(); call missing(of _ff{*}); end; if last then do; do while(hi.next()=0); if cmiss(of _f:) lt 2 then output; end; end; run;
Ksharp
Thqs Ksharp for your answer it worked
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.