Solved: Re: Search and retrieve data

ursula · Posted 01-04-2018 12:14 AM

Hi There,

I need help to search a character and retrieve the data that match the character.

data have;

input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D
2 A_A D_? A_A ?_B
3 B_B C_C C_C D_D
4 R_T P_? C_? K_A
;
run;

I want to output data that have "?" in it. I have more than 100 variables to search for.

I want the output like this:

ID	data1	data2	data3	data4
1	?_B
2		D_?		?_B
4		P_?	C_?

Thanks in advance!

RW9 · Posted 01-04-2018 04:17 AM

This is a common problem, and one caused by the "Excel way of thinking" which seems to permeate throughout SAS programming nowadays. You have X numbers of fixed columns, and Y number of non-fixed columns, and are now trying to find a way to process them logically. A simple mindset change to work "In the programmers way of thinking" makes this kind of issue go away. There are two main data structures, transposed - which is what you have and is useful for reporting out for humans to read, and normalised - which is useful for storage, programming etc.

So from a storage point of view you have 12 cells to contain 5 data items, which is a waste. From a programming point of view you have to work out how many elements, scan through each, and of course do this each time you want to use it.

A far simpler storage would be:

data have;
  input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D 
2 A_A D_? A_A ?_B 
3 B_B C_C C_C D_D 
4 R_T P_? C_? K_A 
;
run;

proc transpose data=have out=want;
  by id;
  var data:;
run;

data want;
  set want;
  where index(col1,"?")>0;
run;

So normalise the data, then its simply a matter of where clauses and such like to access the data you want. Compare that to all the array codes and such like given before, how much simpler?

And if you need the transposed results at the end for a report, use another proc transpose to go up again.

View solution in original post

novinosrin · Posted 01-04-2018 12:18 AM

here you go-

data have;

input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D
2 A_A D_? A_A ?_B
3 B_B C_C C_C D_D
4 R_T P_? C_? K_A
;
run;

data want;
set have;
array t(*) data1-data4;
do _n_=1 to dim(t);
if index(t(_n_),'?')>0 then continue;
else call missing(t(_n_));
end;
if cmiss(of t(*))=dim(t) then delete;
run;

novinosrin · Posted 01-04-2018 12:27 AM

Notes:

1. 100s of variables with the same pattern with a numeric suffix is easy to list as array elements using variable lists like var1-var100 and so on. I trust you can do that.

2. either using variable lists is the best short cut method to specify and compile with the array statement

3. if all vars after the conditional test happens to be missing , would have to be equal to total number of elements in the array to delete the observation

Hope that helps

ursula · Posted 01-04-2018 12:27 AM

thanks for the speedy response!

it's almost there.

as you see that ID 3 does not have any "?" in all variables, so I do not need to retrieve ID 3.

again this is what I want:

ID	data1	data2	data3	data4
1	?_B
2		D_?		?_B
4		P_?	C_?

novinosrin · Posted 01-04-2018 12:28 AM

yes i have edited the code to delete the 3rd obs later. sorry. Please notice the edit. Thank you

Please notice this addition in the edit:

if cmiss(of t(*))=dim(t) then delete;

ursula · Posted 01-04-2018 12:36 AM

Very good!

I just realize that not all the variables have "?" data, I would like not to retrieve the variables that have no "?".

data have;

input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D
2 A_A D_? A_A ?_B
3 B_B C_C C_C D_D
4 R_T P_? C_J K_A
;

run;

the output should look like this: -- no data3

ID	data1	data2	data4
1	?_B
2		D_?	?_B
4		P_?

novinosrin · Posted 01-04-2018 12:40 AM

Do you mean, you want to drop the column from the result if all values of a column is blank?

ursula · Posted 01-04-2018 12:41 AM

yes, please .

novinosrin · Posted 01-04-2018 12:49 AM

Read through this document https://www.lexjansen.com/nesug/nesug13/90_Final_Paper.pdf

while i try something simpler meanwhile

ursula · Posted 01-04-2018 12:55 AM

Thank you very much!

novinosrin · Posted 01-04-2018 01:33 AM

data have;

input ID $ data1 $ data2 $ data3 $ data4 $;

datalines;

1 ?_B C_D A_C B_D

2 A_A D_? A_A ?_B

3 B_B C_C C_C D_D

4 R_T P_? C_J K_A

;

run;

%macro op_ursula;

data want1;

set have nobs=nobs end=last;

array t(*) data1-data4;

array t1(*) _data1-_data4;

do _n_=1 to dim(t);

if index(t(_n_),'?')>0 then continue;

else call missing(t(_n_));

if missing(t(_n_)) then t1(_n_)+1;

if last then do;

if t1(_n_)=nobs then do;

call symputX('POS'||left(_n_),vname(t(_n_)));

_c=_n_;

end;

call symputX('count',_c);

run;

data final_want;

set want1;

drop %do i=1 %to &count;

%if %symexist(pos&i) %then &&pos&count;

%end; _: ;

array t(*) data1-data4;

if cmiss(of t(*))=dim(t) then delete;

run;

%mend;

%op_ursula

ursula · Posted 01-04-2018 02:00 AM

thank you so much for your help.

I wonder why it does not work on my real data, still retrieve all columns even though there are no "?" mark in them.

I do not really understand the codes, but it works on the sample data.

I would look into it later.

Thank you again.

novinosrin · Posted 01-04-2018 02:02 AM

Go through the code statement by statement thoroughly until you understand. I am gonna sleep now. If you need any help on this thread, I'll look into your requirement when i wake

RW9 · Posted 01-04-2018 04:17 AM

This is a common problem, and one caused by the "Excel way of thinking" which seems to permeate throughout SAS programming nowadays. You have X numbers of fixed columns, and Y number of non-fixed columns, and are now trying to find a way to process them logically. A simple mindset change to work "In the programmers way of thinking" makes this kind of issue go away. There are two main data structures, transposed - which is what you have and is useful for reporting out for humans to read, and normalised - which is useful for storage, programming etc.

So from a storage point of view you have 12 cells to contain 5 data items, which is a waste. From a programming point of view you have to work out how many elements, scan through each, and of course do this each time you want to use it.

A far simpler storage would be:

data have;
  input ID $ data1 $ data2 $ data3 $ data4 $;
datalines;
1 ?_B C_D A_C B_D 
2 A_A D_? A_A ?_B 
3 B_B C_C C_C D_D 
4 R_T P_? C_? K_A 
;
run;

proc transpose data=have out=want;
  by id;
  var data:;
run;

data want;
  set want;
  where index(col1,"?")>0;
run;

So normalise the data, then its simply a matter of where clauses and such like to access the data you want. Compare that to all the array codes and such like given before, how much simpler?

And if you need the transposed results at the end for a report, use another proc transpose to go up again.

ursula · Posted 01-04-2018 01:02 PM

Thanks so much for the simpler codes!

yes, it works!

Registration is open

SAS Training: Just a Click Away