BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ursula
Pyrite | Level 9

Hi There,

 

I need help to search a character and retrieve the data that match the character.

 

data have;

input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D
2 A_A D_? A_A ?_B
3 B_B C_C C_C D_D
4 R_T P_? C_? K_A
;
run;

 

I want to output data that have "?" in it. I have more than 100 variables to search for.

 

I want the output like this:

ID data1 data2 data3 data4
1 ?_B      
2   D_?   ?_B
4   P_? C_?  

 

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

This is a common problem, and one caused by the "Excel way of thinking" which seems to permeate throughout SAS programming nowadays.  You have X numbers of fixed columns, and Y number of non-fixed columns, and are now trying to find a way to process them logically.  A simple mindset change to work "In the programmers way of thinking" makes this kind of issue go away.  There are two main data structures, transposed - which is what you have and is useful for reporting out for humans to read, and normalised - which is useful for storage, programming etc.  

So from a storage point of view you have 12 cells to contain 5 data items, which is a waste.  From a programming point of view you have to work out how many elements, scan through each, and of course do this each time you want to use it.  

A far simpler storage would be:

data have;
  input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D 
2 A_A D_? A_A ?_B 
3 B_B C_C C_C D_D 
4 R_T P_? C_? K_A 
;
run;

proc transpose data=have out=want;
  by id;
  var data:;
run;

data want;
  set want;
  where index(col1,"?")>0;
run;

So normalise the data, then its simply a matter of where clauses and such like to access the data you want.  Compare that to all the array codes and such like given before, how much simpler?

And if you need the transposed results at the end for a report, use another proc transpose to go up again.

View solution in original post

15 REPLIES 15
novinosrin
Tourmaline | Level 20

here you go-

 

data have;

input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D
2 A_A D_? A_A ?_B
3 B_B C_C C_C D_D
4 R_T P_? C_? K_A
;
run;

data want;
set have;
array t(*) data1-data4;
do _n_=1 to dim(t);
if index(t(_n_),'?')>0 then continue;
else call missing(t(_n_));
end;
if cmiss(of t(*))=dim(t) then delete;
run;

novinosrin
Tourmaline | Level 20

Notes:

1. 100s of variables with the same pattern with a numeric suffix is easy to list as array elements using variable lists like var1-var100 and so on. I trust you can do that.

 

2. either using variable lists is the best short cut method to specify and compile with the array statement

3. if all vars after the conditional test happens to be missing , would have to be equal to total number of elements in the array to delete the observation

 

Hope that helps

 

ursula
Pyrite | Level 9

thanks for the speedy response!

 

it's almost there.

 

as you see that ID 3 does not have any "?" in all variables, so I do not need to retrieve ID 3.

 

again this is what I want:

ID data1 data2 data3 data4
1 ?_B      
2   D_?   ?_B
4   P_? C_?  
novinosrin
Tourmaline | Level 20

yes i have edited the code to delete the 3rd obs later. sorry. Please notice the edit. Thank you

 

Please notice this addition in the edit:

 

if cmiss(of t(*))=dim(t) then delete;

ursula
Pyrite | Level 9

Very good!

 

I just realize that not all the variables have "?" data, I would like not to retrieve the variables that have no "?".

 

data have;

input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D
2 A_A D_? A_A ?_B
3 B_B C_C C_C D_D
4 R_T P_? C_J K_A
;

run;

 

the output should look like this:  -- no data3

ID data1 data2 data4
1 ?_B    
2   D_? ?_B
4   P_?  
novinosrin
Tourmaline | Level 20

Do you mean, you want to drop the column from the result if all values of a column is blank?

ursula
Pyrite | Level 9

yes, please .

novinosrin
Tourmaline | Level 20

Read through this document https://www.lexjansen.com/nesug/nesug13/90_Final_Paper.pdf

 

while i try something simpler meanwhile

ursula
Pyrite | Level 9

Thank you very much!

novinosrin
Tourmaline | Level 20

data have;

input ID $ data1 $ data2 $ data3 $ data4 $;

datalines;

1 ?_B C_D A_C B_D

2 A_A D_? A_A ?_B

3 B_B C_C C_C D_D

4 R_T P_? C_J K_A

;

run;

%macro op_ursula;

data want1;

set have nobs=nobs end=last;

array t(*) data1-data4;

array t1(*) _data1-_data4;

do _n_=1 to dim(t);

if index(t(_n_),'?')>0 then continue;

else call missing(t(_n_));

if missing(t(_n_)) then t1(_n_)+1;

if last then do;

if t1(_n_)=nobs then do;

call symputX('POS'||left(_n_),vname(t(_n_)));

_c=_n_;

end;

end;

end;

call symputX('count',_c);

run;

 

data final_want;

set want1;

drop  %do i=1 %to &count;

%if %symexist(pos&i) %then  &&pos&count;

%end; _: ;

array t(*)      data1-data4;

if cmiss(of t(*))=dim(t) then delete;

run;

 

%mend;

%op_ursula

ursula
Pyrite | Level 9

thank you so much for your help.

I wonder why it does not work on my real data, still retrieve all columns even though there are no "?" mark in them.

I do not really understand the codes, but it works on the sample data.

 

I would look into it later.

 

Thank you again.

novinosrin
Tourmaline | Level 20

Go through the code statement by statement thoroughly until you understand. I am gonna sleep now. If you need any help on this thread, I'll look into your requirement when i wake

RW9
Diamond | Level 26 RW9
Diamond | Level 26

This is a common problem, and one caused by the "Excel way of thinking" which seems to permeate throughout SAS programming nowadays.  You have X numbers of fixed columns, and Y number of non-fixed columns, and are now trying to find a way to process them logically.  A simple mindset change to work "In the programmers way of thinking" makes this kind of issue go away.  There are two main data structures, transposed - which is what you have and is useful for reporting out for humans to read, and normalised - which is useful for storage, programming etc.  

So from a storage point of view you have 12 cells to contain 5 data items, which is a waste.  From a programming point of view you have to work out how many elements, scan through each, and of course do this each time you want to use it.  

A far simpler storage would be:

data have;
  input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D 
2 A_A D_? A_A ?_B 
3 B_B C_C C_C D_D 
4 R_T P_? C_? K_A 
;
run;

proc transpose data=have out=want;
  by id;
  var data:;
run;

data want;
  set want;
  where index(col1,"?")>0;
run;

So normalise the data, then its simply a matter of where clauses and such like to access the data you want.  Compare that to all the array codes and such like given before, how much simpler?

And if you need the transposed results at the end for a report, use another proc transpose to go up again.

ursula
Pyrite | Level 9

Thanks so much for the simpler codes!

 

yes, it works!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 3598 views
  • 4 likes
  • 4 in conversation