Dear friends at SAS community
I have a dataset that has 4000 observations.
In this dataset, I have a variable var1 which contains text data like this"[123/121/324/345]". I need to extract the numbers as separate numeric variables. For any observation, there can be up to 30 numbers in var1 that I need to extract:
so I am trying to extract new1=123, new2=121, new3=324 and new4=345. I tried the following code:
new1=scan(compress(var1, '[]'),1,'/');
new2=scan(compress(var1, '[]'),2,'/');
new3=scan(compress(var1, '[]'),3,'/');
new4=scan(compress(var1, '[]'),4,'/');
but the results do not seem correct. The last number is missing and never extracted. Also, when an observation only has one number in var1, e.g. var1="[123]", new1 is missing rather than 123 that I want to have.
so for var1="[123/121/324/345]"; I got new1=123, new2=121, new3=324, new4=. when I want new4=345
for var1="[123]", I got new1=. when I want new1=123
any thoughts of a solution?
thanks
Raymond
data w;
k="[123/121/324/345]";
k1=compress(k,'[]');
do _n_=1 to countw(k1);
num=scan(k1,_n_,'/');
output;
end;
drop k1;
run;
data w;
k="[123/121/324/345]";
k1=compress(k,'[]');
do _n_=1 to countw(k1);
num=scan(k1,_n_,'/');
output;
end;
drop k1;
run;
dear novinosrin,
thanks so much for your help. one of such "var1" variables is cardiac_nuclear_tests_year so I ran the code below
data w;
k1=compress(cardiac_nuclear_tests_year,'[]');
do _n_=1 to countw(k1);
num=scan(k1,_n_,'/');
output;
end;
drop k1;
run;
I am still having problems, because this code puts num in a long format, instead of as separate variables year1, year2....year5. Also other than cardiac_nuclear_tests_year, I have 10 other similar variables to extract from. Any other thought? many thanks
data w;
k="[123/121/324/345]";
k1=compress(k,'[]');
grp=0;
do _n_=1 to countw(k1);
num=scan(k1,_n_,'/');
grp+1;
output;
end;
drop k1;
run;
proc transpose data=w out=want prefix=year;
by k;
var num;
id grp;
run;
@rykwong wrote:
dear novinosrin,
thanks so much for your help. one of such "var1" variables is cardiac_nuclear_tests_year so I ran the code below
data w; k1=compress(cardiac_nuclear_tests_year,'[]'); do _n_=1 to countw(k1); num=scan(k1,_n_,'/'); output; end; drop k1; run;
I am still having problems, because this code puts num in a long format, instead of as separate variables year1, year2....year5. Also other than cardiac_nuclear_tests_year, I have 10 other similar variables to extract from. Any other thought? many thanks
Perhaps:
data w; set have; k1=compress(cardiac_nuclear_tests_year,'[]'); array num{30}; do i=1 to countw(k1); num[i]=scan(k1,i,'/'); end; drop k1 i; run;
I consider creating a variable _n_ to be suboptimal as SAS has an automatic variable _n_ that indicates the iteration of the data step and could 1) be confusing in code and 2) mess up the count if needed.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.