Hi There,
I need help to search a character and retrieve the data that match the character.
data have;
input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D
2 A_A D_? A_A ?_B
3 B_B C_C C_C D_D
4 R_T P_? C_? K_A
;
run;
I want to output data that have "?" in it. I have more than 100 variables to search for.
I want the output like this:
ID | data1 | data2 | data3 | data4 |
1 | ?_B | |||
2 | D_? | ?_B | ||
4 | P_? | C_? |
Thanks in advance!
This is a common problem, and one caused by the "Excel way of thinking" which seems to permeate throughout SAS programming nowadays. You have X numbers of fixed columns, and Y number of non-fixed columns, and are now trying to find a way to process them logically. A simple mindset change to work "In the programmers way of thinking" makes this kind of issue go away. There are two main data structures, transposed - which is what you have and is useful for reporting out for humans to read, and normalised - which is useful for storage, programming etc.
So from a storage point of view you have 12 cells to contain 5 data items, which is a waste. From a programming point of view you have to work out how many elements, scan through each, and of course do this each time you want to use it.
A far simpler storage would be:
data have; input ID $ data1 $ data2 $ data3 $ data4 $; datalines; 1 ?_B C_D A_C B_D 2 A_A D_? A_A ?_B 3 B_B C_C C_C D_D 4 R_T P_? C_? K_A ; run; proc transpose data=have out=want; by id; var data:; run; data want; set want; where index(col1,"?")>0; run;
So normalise the data, then its simply a matter of where clauses and such like to access the data you want. Compare that to all the array codes and such like given before, how much simpler?
And if you need the transposed results at the end for a report, use another proc transpose to go up again.
here you go-
data have;
input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D
2 A_A D_? A_A ?_B
3 B_B C_C C_C D_D
4 R_T P_? C_? K_A
;
run;
data want;
set have;
array t(*) data1-data4;
do _n_=1 to dim(t);
if index(t(_n_),'?')>0 then continue;
else call missing(t(_n_));
end;
if cmiss(of t(*))=dim(t) then delete;
run;
Notes:
1. 100s of variables with the same pattern with a numeric suffix is easy to list as array elements using variable lists like var1-var100 and so on. I trust you can do that.
2. either using variable lists is the best short cut method to specify and compile with the array statement
3. if all vars after the conditional test happens to be missing , would have to be equal to total number of elements in the array to delete the observation
Hope that helps
thanks for the speedy response!
it's almost there.
as you see that ID 3 does not have any "?" in all variables, so I do not need to retrieve ID 3.
again this is what I want:
ID | data1 | data2 | data3 | data4 |
1 | ?_B | |||
2 | D_? | ?_B | ||
4 | P_? | C_? |
yes i have edited the code to delete the 3rd obs later. sorry. Please notice the edit. Thank you
Please notice this addition in the edit:
if cmiss(of t(*))=dim(t) then delete;
Very good!
I just realize that not all the variables have "?" data, I would like not to retrieve the variables that have no "?".
data have;
input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D
2 A_A D_? A_A ?_B
3 B_B C_C C_C D_D
4 R_T P_? C_J K_A
;
run;
the output should look like this: -- no data3
ID | data1 | data2 | data4 |
1 | ?_B | ||
2 | D_? | ?_B | |
4 | P_? |
Do you mean, you want to drop the column from the result if all values of a column is blank?
yes, please .
Read through this document https://www.lexjansen.com/nesug/nesug13/90_Final_Paper.pdf
while i try something simpler meanwhile
Thank you very much!
data have;
input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D
2 A_A D_? A_A ?_B
3 B_B C_C C_C D_D
4 R_T P_? C_J K_A
;
run;
%macro op_ursula;
data want1;
set have nobs=nobs end=last;
array t(*) data1-data4;
array t1(*) _data1-_data4;
do _n_=1 to dim(t);
if index(t(_n_),'?')>0 then continue;
else call missing(t(_n_));
if missing(t(_n_)) then t1(_n_)+1;
if last then do;
if t1(_n_)=nobs then do;
call symputX('POS'||left(_n_),vname(t(_n_)));
_c=_n_;
end;
end;
end;
call symputX('count',_c);
run;
data final_want;
set want1;
drop %do i=1 %to &count;
%if %symexist(pos&i) %then &&pos&count;
%end; _: ;
array t(*) data1-data4;
if cmiss(of t(*))=dim(t) then delete;
run;
%mend;
%op_ursula
thank you so much for your help.
I wonder why it does not work on my real data, still retrieve all columns even though there are no "?" mark in them.
I do not really understand the codes, but it works on the sample data.
I would look into it later.
Thank you again.
Go through the code statement by statement thoroughly until you understand. I am gonna sleep now. If you need any help on this thread, I'll look into your requirement when i wake
This is a common problem, and one caused by the "Excel way of thinking" which seems to permeate throughout SAS programming nowadays. You have X numbers of fixed columns, and Y number of non-fixed columns, and are now trying to find a way to process them logically. A simple mindset change to work "In the programmers way of thinking" makes this kind of issue go away. There are two main data structures, transposed - which is what you have and is useful for reporting out for humans to read, and normalised - which is useful for storage, programming etc.
So from a storage point of view you have 12 cells to contain 5 data items, which is a waste. From a programming point of view you have to work out how many elements, scan through each, and of course do this each time you want to use it.
A far simpler storage would be:
data have; input ID $ data1 $ data2 $ data3 $ data4 $; datalines; 1 ?_B C_D A_C B_D 2 A_A D_? A_A ?_B 3 B_B C_C C_C D_D 4 R_T P_? C_? K_A ; run; proc transpose data=have out=want; by id; var data:; run; data want; set want; where index(col1,"?")>0; run;
So normalise the data, then its simply a matter of where clauses and such like to access the data you want. Compare that to all the array codes and such like given before, how much simpler?
And if you need the transposed results at the end for a report, use another proc transpose to go up again.
Thanks so much for the simpler codes!
yes, it works!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.