BookmarkSubscribeRSS Feed
Jagadishkatam
Amethyst | Level 16

Dear Experts,

 

I would like to impute the values, and for the same i have attached an excel file with id, visit, time and value. With these i need to derive the expected imputed values varible based on the below rules. This is based on by statement id and visit.

 

Leading character data (xx) has to imputed to 0
Single xx between numeric values has to be imputed to blank
More than one consecutive xx between numeric values has to be imputed to 0
numeric values after more than one consecutive mid-profile XX has ot be imputed to blank
Trailing xx (consecutive xx in the tail) has to be imputed to 0

 

I used the below code, but i want to improve the same code with perl regular expression or any alternate approach,

 

%macro test(in=have2,out=final,by=subj,var=x,newvar=x2,firstdot=subj);
proc sort data= &in.;
by &by.;
run;

data &in.1;
set &in.;
by &by.;
if time=0 and &var.='xx' then res='0';
else if time^=0 and &var.='xx' then res='1';
else if &var. in ('yy') then res='2';
else res='9';
run;


data &in.2;
length new2 $10.;
do until(last.&firstdot.);
set &in.1;
by &by.;
retain new2;
if first.&firstdot. then new2=strip(res);
else new2=cats(new2,res);
end;
i=0;
do until(last.&firstdot.);
set have1;
by &by.;
i+1;
string=substr(new2,i);
output;
end;
run;

data &in.3;
length new $10.;
set &in.2;
by &by.;
retain new;
if first.&firstdot. then new=strip(res);
else new=cats(new,res);
run;

data &out.;
set  &in.3;
by &by.;
retain flag2;
if time=0 and new='0' then flags='0';
if prxmatch('/01/',new)>0 and index(substr(new,3),'9')=0 then flags='0';
if &var. in ('yy') then flags='';
if prxmatch('/91\s/',new)>0 then flags='1';
if first.&firstdot. then flag2='';
if prxmatch('/911/',new)>0 then flag2='1';
if prxmatch('/121\s/',new)>0 then flag2='1';
flag3=coalescec(flags,flag2);
if flag3 not in ('1') and res ne '2' then &newvar.=coalescec(flag3,&var.);
if &newvar. eq '' and prxmatch('/91{2,4}\s/',new) then  &newvar.='0';
if &newvar. eq '' and prxmatch('/^11/',string) then  &newvar.='0';
/*drop res new flags flag2 flag3;*/
run;
%mend;

options mprint;
%test(in=have,out=final,by=id visit,var=x,newvar=x2,firstdot=visit);

Appreciate your help.

Thanks,
Jag
5 REPLIES 5
ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... has instructions on how to turn SAS datasets into data step code that you may paste into the forum and the we can run to duplicate your example data.

Jagadishkatam
Amethyst | Level 16

Here the data on which i need to derive the expected imputed value

 

to add to the above rules i mentioned in the post above wherever there is yy  it should be imputed to blank

 

id 	Visit 	time	value 	        expected imputed values
1     	1	0    	xx	        0
1     	1	1    	0.843180271	0.843180271
1     	1	2    	0.208977032	0.208977032
1     	1	3    	0.907516743	0.907516743
1     	1	4    	0.802715893	0.802715893
1     	1	5    	0.455216516	0.455216516
1     	1	6    	0.281721672	0.281721672
1     	1	7    	0.522158608	0.522158608
1     	1	8    	0.650817007	0.650817007
4     	1	0    	0.5393922	0.5393922
4     	1	1    	0.764791537	0.764791537
4     	1	2    	xx	        0
4     	1	3    	0.518281777	0.518281777
4     	1	4    	0.770101569	0.770101569
4     	1	5    	0.011486988	0.011486988
4     	1	6    	xx	
4     	1	7    	xx	
4     	1	8    	0.587449508	
5     	1	0    	0.60354048	0.60354048
5     	1	1    	0.390895449	0.390895449
5     	1	2    	xx	        0
5     	1	3    	xx	        0
5     	1	4    	0.278360279	
5     	1	5    	0.103553382	
5     	1	6    	0.519709424	
5     	1	7    	xx	
5     	1	8    	0.064817437	
6     	1	0    	0.102335765	0.102335765
6     	1	1    	0.227020843	0.227020843
6     	1	2    	xx	        0
6     	1	3    	xx	        0
6     	1	4    	xx	        0
6     	1	5    	0.652625406	
6     	1	6    	0.826921215	
6     	1	7    	xx	        0
6     	1	8    	xx	        0
7     	1	0    	0.625855052	
7     	1	1    	xx	        0
7     	1	2    	yy	
7     	1	3    	xx	        0
7     	1	4    	0.037477881	
7     	1	5    	0.636017947	
7     	1	6    	xx	
7     	1	7    	0.153795614	
7     	1	8    	0.22973436	
8     	1	0    	0.244375049	0.244375049
8     	1	1    	0.928145364	0.928145364
8     	1	2    	0.611441405	0.611441405
8     	1	3    	xx	        0
8     	1	4    	xx	        0
8     	1	5    	yy	
8     	1	6    	0.789341609	
8     	1	7    	0.518628572	
8     	1	8    	0.261352194	
9     	1	0    	0.652630206	0.652630206
9     	1	1    	0.297000038	0.297000038
9     	1	2    	0.328285117	0.328285117
9     	1	3    	0.435990373	0.435990373
9     	1	4    	xx	
9     	1	5    	0.182569321	0.182569321
9     	1	6    	0.529125888	0.529125888
9     	1	7    	xx	        0
9     	1	8    	xx	        0
10    	1	0    	xx	        0
10    	1	1    	xx	        0
10    	1	2    	yy	
10    	1	3    	xx	        0
10    	1	4    	xx	        0
10    	1	5    	0.932330165	0.932330165
10    	1	6    	0.827900548	0.827900548
10    	1	7    	0.779063734	0.779063734
10    	1	8    	0.576971588	0.576971588
1     	2	0    	xx	        0
1     	2	1    	0.843180271	0.843180271
1     	2	2    	0.208977032	0.208977032
1     	2	3    	0.907516743	0.907516743
1     	2	4    	0.802715893	0.802715893
1     	2	5    	0.455216516	0.455216516
1     	2	6    	0.281721672	0.281721672
1     	2	7    	0.522158608	0.522158608
1     	2	8    	0.650817007	0.650817007
4     	2	0    	0.5393922	0.5393922
4     	2	1    	0.764791537	0.764791537
4     	2	2    	xx	        0
4     	2	3    	0.518281777	0.518281777
4     	2	4    	0.770101569	0.770101569
4     	2	5    	0.011486988	0.011486988
4     	2	6    	xx	
4     	2	7    	xx	
4     	2	8    	0.587449508	
5     	2	0    	0.60354048	0.60354048
5     	2	1    	0.390895449	0.390895449
5     	2	2    	xx	        0
5     	2	3    	xx	        0
5     	2	4    	0.278360279	
5     	2	5    	0.103553382	
5     	2	6    	0.519709424	
5     	2	7    	xx	
5     	2	8    	0.064817437	
6     	2	0    	0.102335765	0.102335765
6     	2	1    	0.227020843	0.227020843
6     	2	2    	xx	        0
6     	2	3    	xx	        0
6     	2	4    	xx	        0
6     	2	5    	0.652625406	
6     	2	6    	0.826921215	
6     	2	7    	xx	        0
6     	2	8    	xx	        0
7     	2	0    	0.625855052	
7     	2	1    	xx	        0
7     	2	2    	yy	
7     	2	3    	xx	        0 
7     	2	4    	0.037477881	
7     	2	5    	0.636017947	
7     	2	6    	xx	
7     	2	7    	0.153795614	
7     	2	8    	0.22973436	
8     	2	0    	0.244375049	0.244375049
8     	2	1    	0.928145364	0.928145364
8     	2	2    	0.611441405	0.611441405
8     	2	3    	xx	        0
8     	2	4    	xx	        0
8     	2	5    	yy	
8     	2	6    	0.789341609	
8     	2	7    	0.518628572	
8     	2	8    	0.261352194	
9     	2	0    	0.652630206	0.652630206
9     	2	1    	0.297000038	0.297000038
9     	2	2    	0.328285117	0.328285117
9     	2	3    	0.435990373	0.435990373
9     	2	4    	xx	
9     	2	5    	0.182569321	0.182569321
9     	2	6    	0.529125888	0.529125888
9     	2	7    	xx	        0
9     	2	8    	xx	        0
10    	2	0    	xx	        0
10    	2	1    	xx	        0
10    	2	2    	yy	
10    	2	3    	xx	        0
10    	2	4    	xx	        0
10    	2	5    	0.932330165	0.932330165
10    	2	6    	0.827900548	0.827900548
10    	2	7    	0.779063734	0.779063734
10    	2	8    	0.576971588	0.576971588
Thanks,
Jag
Jagadishkatam
Amethyst | Level 16

 

Here is the code with data that could you used to derive the expected imputed values. could you please guide me to a more efficient code based on the rules i mentioned earlier.

 

 

data have ;
infile cards dlm='09'x;
input id$ Visit time_$ x$;
cards; 
1     	1	0    	xx
1     	1	1    	0.843180271
1     	1	2    	0.208977032
1     	1	3    	0.907516743
1     	1	4    	0.802715893
1     	1	5    	0.455216516
1     	1	6    	0.281721672
1     	1	7    	0.522158608
1     	1	8    	0.650817007
4     	1	0    	0.5393922
4     	1	1    	0.764791537
4     	1	2    	xx
4     	1	3    	0.518281777
4     	1	4    	0.770101569
4     	1	5    	0.011486988
4     	1	6    	xx
4     	1	7    	xx
4     	1	8    	0.587449508
5     	1	0    	0.60354048
5     	1	1    	0.390895449
5     	1	2    	xx
5     	1	3    	xx
5     	1	4    	0.278360279
5     	1	5    	0.103553382
5     	1	6    	0.519709424
5     	1	7    	xx
5     	1	8    	0.064817437
6     	1	0    	0.102335765
6     	1	1    	0.227020843
6     	1	2    	xx
6     	1	3    	xx
6     	1	4    	xx
6     	1	5    	0.652625406
6     	1	6    	0.826921215
6     	1	7    	xx
6     	1	8    	xx
7     	1	0    	0.625855052
7     	1	1    	xx
7     	1	2    	yy
7     	1	3    	xx
7     	1	4    	0.037477881
7     	1	5    	0.636017947
7     	1	6    	xx
7     	1	7    	0.153795614
7     	1	8    	0.22973436
8     	1	0    	0.244375049
8     	1	1    	0.928145364
8     	1	2    	0.611441405
8     	1	3    	xx
8     	1	4    	xx
8     	1	5    	yy
8     	1	6    	0.789341609
8     	1	7    	0.518628572
8     	1	8    	0.261352194
9     	1	0    	0.652630206
9     	1	1    	0.297000038
9     	1	2    	0.328285117
9     	1	3    	0.435990373
9     	1	4    	xx
9     	1	5    	0.182569321
9     	1	6    	0.529125888
9     	1	7    	xx
9     	1	8    	xx
10    	1	0    	xx
10    	1	1    	xx
10    	1	2    	yy
10    	1	3    	xx
10    	1	4    	xx
10    	1	5    	0.932330165
10    	1	6    	0.827900548
10    	1	7    	0.779063734
10    	1	8    	0.576971588
1     	2	0    	xx
1     	2	1    	0.843180271
1     	2	2    	0.208977032
1     	2	3    	0.907516743
1     	2	4    	0.802715893
1     	2	5    	0.455216516
1     	2	6    	0.281721672
1     	2	7    	0.522158608
1     	2	8    	0.650817007
4     	2	0    	0.5393922
4     	2	1    	0.764791537
4     	2	2    	xx
4     	2	3    	0.518281777
4     	2	4    	0.770101569
4     	2	5    	0.011486988
4     	2	6    	xx
4     	2	7    	xx
4     	2	8    	0.587449508
5     	2	0    	0.60354048
5     	2	1    	0.390895449
5     	2	2    	xx
5     	2	3    	xx
5     	2	4    	0.278360279
5     	2	5    	0.103553382
5     	2	6    	0.519709424
5     	2	7    	xx
5     	2	8    	0.064817437
6     	2	0    	0.102335765
6     	2	1    	0.227020843
6     	2	2    	xx
6     	2	3    	xx
6     	2	4    	xx
6     	2	5    	0.652625406
6     	2	6    	0.826921215
6     	2	7    	xx
6     	2	8    	xx
7     	2	0    	0.625855052
7     	2	1    	xx
7     	2	2    	yy
7     	2	3    	xx
7     	2	4    	0.037477881
7     	2	5    	0.636017947
7     	2	6    	xx
7     	2	7    	0.153795614
7     	2	8    	0.22973436
8     	2	0    	0.244375049
8     	2	1    	0.928145364
8     	2	2    	0.611441405
8     	2	3    	xx
8     	2	4    	xx
8     	2	5    	yy
8     	2	6    	0.789341609
8     	2	7    	0.518628572
8     	2	8    	0.261352194
9     	2	0    	0.652630206
9     	2	1    	0.297000038
9     	2	2    	0.328285117
9     	2	3    	0.435990373
9     	2	4    	xx
9     	2	5    	0.182569321
9     	2	6    	0.529125888
9     	2	7    	xx
9     	2	8    	xx
10    	2	0    	xx
10    	2	1    	xx
10    	2	2    	yy
10    	2	3    	xx
10    	2	4    	xx
10    	2	5    	0.932330165
10    	2	6    	0.827900548
10    	2	7    	0.779063734
10    	2	8    	0.576971588
;

data have;
set have;
time=input(compress(time_,,'kd'),best.);
run;
Thanks,
Jag
Ksharp
Super User
NO. Your output doesn't look like what you are asking for.
And there are also too many scenarios you need to consider about .

data have ;
infile cards expandtabs;
input id$ Visit $ time_$ x$;
cards; 
1     	1	0    	xx
1     	1	1    	0.843180271
1     	1	2    	0.208977032
1     	1	3    	0.907516743
1     	1	4    	0.802715893
1     	1	5    	0.455216516
1     	1	6    	0.281721672
1     	1	7    	0.522158608
1     	1	8    	0.650817007
4     	1	0    	0.5393922
4     	1	1    	0.764791537
4     	1	2    	xx
4     	1	3    	0.518281777
4     	1	4    	0.770101569
4     	1	5    	0.011486988
4     	1	6    	xx
4     	1	7    	xx
4     	1	8    	0.587449508
5     	1	0    	0.60354048
5     	1	1    	0.390895449
5     	1	2    	xx
5     	1	3    	xx
5     	1	4    	0.278360279
5     	1	5    	0.103553382
5     	1	6    	0.519709424
5     	1	7    	xx
5     	1	8    	0.064817437
6     	1	0    	0.102335765
6     	1	1    	0.227020843
6     	1	2    	xx
6     	1	3    	xx
6     	1	4    	xx
6     	1	5    	0.652625406
6     	1	6    	0.826921215
6     	1	7    	xx
6     	1	8    	xx
7     	1	0    	0.625855052
7     	1	1    	xx
7     	1	2    	yy
7     	1	3    	xx
7     	1	4    	0.037477881
7     	1	5    	0.636017947
7     	1	6    	xx
7     	1	7    	0.153795614
7     	1	8    	0.22973436
8     	1	0    	0.244375049
8     	1	1    	0.928145364
8     	1	2    	0.611441405
8     	1	3    	xx
8     	1	4    	xx
8     	1	5    	yy
8     	1	6    	0.789341609
8     	1	7    	0.518628572
8     	1	8    	0.261352194
9     	1	0    	0.652630206
9     	1	1    	0.297000038
9     	1	2    	0.328285117
9     	1	3    	0.435990373
9     	1	4    	xx
9     	1	5    	0.182569321
9     	1	6    	0.529125888
9     	1	7    	xx
9     	1	8    	xx
10    	1	0    	xx
10    	1	1    	xx
10    	1	2    	yy
10    	1	3    	xx
10    	1	4    	xx
10    	1	5    	0.932330165
10    	1	6    	0.827900548
10    	1	7    	0.779063734
10    	1	8    	0.576971588
1     	2	0    	xx
1     	2	1    	0.843180271
1     	2	2    	0.208977032
1     	2	3    	0.907516743
1     	2	4    	0.802715893
1     	2	5    	0.455216516
1     	2	6    	0.281721672
1     	2	7    	0.522158608
1     	2	8    	0.650817007
4     	2	0    	0.5393922
4     	2	1    	0.764791537
4     	2	2    	xx
4     	2	3    	0.518281777
4     	2	4    	0.770101569
4     	2	5    	0.011486988
4     	2	6    	xx
4     	2	7    	xx
4     	2	8    	0.587449508
5     	2	0    	0.60354048
5     	2	1    	0.390895449
5     	2	2    	xx
5     	2	3    	xx
5     	2	4    	0.278360279
5     	2	5    	0.103553382
5     	2	6    	0.519709424
5     	2	7    	xx
5     	2	8    	0.064817437
6     	2	0    	0.102335765
6     	2	1    	0.227020843
6     	2	2    	xx
6     	2	3    	xx
6     	2	4    	xx
6     	2	5    	0.652625406
6     	2	6    	0.826921215
6     	2	7    	xx
6     	2	8    	xx
7     	2	0    	0.625855052
7     	2	1    	xx
7     	2	2    	yy
7     	2	3    	xx
7     	2	4    	0.037477881
7     	2	5    	0.636017947
7     	2	6    	xx
7     	2	7    	0.153795614
7     	2	8    	0.22973436
8     	2	0    	0.244375049
8     	2	1    	0.928145364
8     	2	2    	0.611441405
8     	2	3    	xx
8     	2	4    	xx
8     	2	5    	yy
8     	2	6    	0.789341609
8     	2	7    	0.518628572
8     	2	8    	0.261352194
9     	2	0    	0.652630206
9     	2	1    	0.297000038
9     	2	2    	0.328285117
9     	2	3    	0.435990373
9     	2	4    	xx
9     	2	5    	0.182569321
9     	2	6    	0.529125888
9     	2	7    	xx
9     	2	8    	xx
10    	2	0    	xx
10    	2	1    	xx
10    	2	2    	yy
10    	2	3    	xx
10    	2	4    	xx
10    	2	5    	0.932330165
10    	2	6    	0.827900548
10    	2	7    	0.779063734
10    	2	8    	0.576971588
;

data have;
set have;
if x in ('xx' 'yy') then flag=1;
 else flag=0;
run;
data want;
length want $ 20;
retain has_multi_x;
xx=0;

 do until(last.flag);
  set have;
  by id flag notsorted;
  if first.id then do;first_id=1; has_multi_x=0;end;
  if last.id then last_id=1;
  if x='xx' then xx+1;
end;

_flag=flag;
if flag and xx gt 1 then has_multi_x=1;

do until(last.flag);
 set have;
 by id flag notsorted;
 if _flag then do;
  if first_id then  do;if x='xx' then want='0';end;
  if last_id and xx gt 1 then  do;if x='xx' then want='0';end;
  if not first_id and not last_id and xx gt 1 then do;if x='xx' then want='0';end;
 end;
 else do; 
  want=x;
  if has_multi_x then want=' ';
 end;

 output;
 want=' ';
end;

drop xx flag _flag has_multi_x  first_id last_id;
run;

Astounding
PROC Star

Originally, I skipped over this problem because it's not what I think of when I see the word "imputed".  Here's a stab at it.  It covers most of what you asked for, but not all.  The trailing xx values are the end of a visit might need to be adjusted.

 

Assuming your data set is already in sorted order, here are some useful pieces:

 

data want;

set have;

length imputed_value $ 20;

by id visit value notsorted;

if first.visit then status='new visit';

retain status;

if input(value, ??10.) > . then do;

   if status='new visit' then status='begin seq';

   if status='begin seq' then imputed_value=value;

end;

if value='yy' then do;

   if status='begin seq' then status='broken';

end;

if value='xx' then do;

   if first.value=0 or last.value=0 then do;

      imputed_value='0';

      if status='begin seq' then status='broken';

   end;

   else if status='begin seq' then imputed_value='0';

end;

run;

 

I agree with KSharp, though.  There appear to be gaps in the logic where the intended result should be spelled out  For example, look at ID=7, visit=2 where there is a "yy" in between two "xx" values.  The two "xx" values look like they are being treated as consecutive "xx" values.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 962 views
  • 0 likes
  • 4 in conversation