BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LFern
Obsidian | Level 7

Hello, I have the following:

data have;
    input original_string $1-35;
    datalines;
I was 12 yearsold I am 50 yearsold
;
run;

I want to achieve the following:

data want;
    input desired_age1 desired_age2;
    datalines;
12 50
;
run;

 

I figured I need to use "SCAN" like below, except I need to figure out how to loop it so that I can capture multiple ages. 

data want;
set have;
do i=1 to countw(original_string,"");
	if scan(original_string,i,"")="yearsold" then do;
	  desired_age1=scan(original_string,i-1,"");
	end;
end;
run;


Any thoughts? I appreciate any help.

1 ACCEPTED SOLUTION

Accepted Solutions
Amir
PROC Star

Hi @LFern,

 

Does the following do what you require?

 

/* set up input data */
data have;
   infile datalines truncover;
   input original_string $1-100;
   datalines;
I was 12 yearsold I am 50 yearsold
I was 22 yearsold
I am 33 yearsold
I was 44 yearsold I am 55 yearsold tomorrow I'm 56 yearsold
;


/* output each age to a new observation */
data want1;
   set have;

   do i = 1 to countw(original_string,"");
   	if scan(original_string,i,"") eq "yearsold" then
      do;
         desired_age = scan(original_string,i-1,"");
         output;
   	end;
   end;
run;


/* transpose desired_age by original_string */
proc transpose data   = want1
               out    = want2
               name   = original_string
               prefix = desired_age
               ;
   by original_string notsorted;
   var desired_age;
run;

 

Kind regards,

Amir.

View solution in original post

8 REPLIES 8
Shmuel
Garnet | Level 18

For the sample you have posted your code should work. Have you tried it?

To get a better solution we need to see more test data and possible combinations.

LFern
Obsidian | Level 7

Hi Shmuel,

My posted code only yields "Desired_age1=50".

But I also need a variable called "Desired_age2=12" and I dont know how to get that second age.

Shmuel
Garnet | Level 18

You are right. I have fixed your code:

data want;
set have;
    array age {*} desired_age1 - desired_age2;
	do i=1 to countw(original_string,"");
		if scan(original_string,i,"")="yearsold" then do;
		  if i  le dim(age) then
		     age(i)=scan(original_string,i-1,"");
		end;
	end;
run;
LFern
Obsidian | Level 7

Hello again,

Sorry to bother. The code you provided displays missing values.

SAS_want.PNG

I'm not familiar with arrays but I did some quick research and I think the reason your code isn't working is because "Desired_Age1" & "Desired_Age2" are not variables in the "HAVE" dataset.

Shmuel
Garnet | Level 18

@LFern , you are right. I fixed the code and tested it:

data have;
    input original_string $1-35;
    datalines;
I was 12 yearsold I am 50 yearsold
;
run;
data want;
set have;
    array age {*} desired_age1 - desired_age2;
    j=0;
	do i=1 to countw(original_string);
		if scan(original_string,i)="yearsold" then do;
		  j+1;
		  if j le dim(age) then 
		     age(j)=input(scan(original_string,i-1),best2.); 
		end;
	end;
	drop i j;
run;
/* space is a default delimiter of scan() function */


Amir
PROC Star

Hi @LFern,

 

Does the following do what you require?

 

/* set up input data */
data have;
   infile datalines truncover;
   input original_string $1-100;
   datalines;
I was 12 yearsold I am 50 yearsold
I was 22 yearsold
I am 33 yearsold
I was 44 yearsold I am 55 yearsold tomorrow I'm 56 yearsold
;


/* output each age to a new observation */
data want1;
   set have;

   do i = 1 to countw(original_string,"");
   	if scan(original_string,i,"") eq "yearsold" then
      do;
         desired_age = scan(original_string,i-1,"");
         output;
   	end;
   end;
run;


/* transpose desired_age by original_string */
proc transpose data   = want1
               out    = want2
               name   = original_string
               prefix = desired_age
               ;
   by original_string notsorted;
   var desired_age;
run;

 

Kind regards,

Amir.

LFern
Obsidian | Level 7

This was fantastic, thank you so much!

Ksharp
Super User
data have;
   infile datalines truncover;
   input original_string $1-100;
   datalines;
I was 12 yearsold I am 50 yearsold
I was 22 yearsold
I am 33 yearsold
I was 44 yearsold I am 55 yearsold tomorrow I'm 56 yearsold
;

data temp;
 set have;
 id+1;
 pid=prxparse('/\d+\s*(?=yearsold)/i');
 s=1;e=length(original_string);
 call prxnext(pid,s,e,original_string,p,l);
 do while(p>0);
  want=substr(original_string,p,l);output;
  call prxnext(pid,s,e,original_string,p,l);
 end;
 drop s e p l pid;
run;
proc transpose data=temp out=want(drop=_:);
by id original_string;
var want;
run;