I have a string as below with variable names embedded in square brackets and plain text in quotes.Need to apply the coalescec function for identifying the first non missing values. I have used the do loop , SCAN and COUNTW function for identifying the variables , separating and using a strip for each variable and for differentiation of the variable names and text in quotes. Is there any simpler way to perform with one/two line of code to get the required output as shown. HAVE is the column with the string and WANT is the OUTPUT in simple steps.Variables and Text can be in any order.
HAVE | WANT |
(values=[DECOD] [TEXT]) | VAR=coalescec(strip(DECOD),strip(TEXT)) |
(values=[DECOD] [TEXT] "NOT REPORTED") | VAR=coalescec(strip(DECOD), strip(TEXT), "NOT REPORTED") |
(values= "NOT REPORTED" [DECOD] "Zero data"[TEXT] ) | VAR=coalescec("NOT REPORTED",strip(DECOD), "zero data",strip(TEXT) ) |
(values= [TEXT] "Not Reported,NOW" [DECOD] ) | VAR=coalescec (strip(TEXT) "Not REPORTED,Now" , strip(DECOD) ) |
So you want to create code from strings stored in a dataset?
If you can already parse it aren't you just asking to make branch based on whether the current token starts with square backet or double quote?
If your source is as messy as your example then you need to something to put delimiters between the tokens.
So perhaps like this?
data have;
input have $80.;
cards4;
(values=[DECOD] [TEXT])
(values=[DECOD] [TEXT] "NOT REPORTED")
(values= "NOT REPORTED" [DECOD] "Zero data"[TEXT] )
(values= [TEXT] "Not Reported,NOW" [DECOD] )
;;;;
data want;
set have;
length word want $80 ;
have=tranwrd(have,'][','] [');
have=tranwrd(have,'"[','" [');
have=tranwrd(have,']"','] "');
want='VAR=coalescec';
sep='(';
do index=2 to countw(have,'= )','q');
word=scan(have,index,'= )','q');
if word=:'[' then want=catx(sep,want,cats('strip(',scan(word,1,'[]'),')'));
else want=catx(sep,want,word);
sep=',';
end;
want=cats(want,')');
drop word index sep;
run;
Results:
Obs have 1 (values=[DECOD] [TEXT]) 2 (values=[DECOD] [TEXT] "NOT REPORTED") 3 (values= "NOT REPORTED" [DECOD] "Zero data" [TEXT] ) 4 (values= [TEXT] "Not Reported,NOW" [DECOD] ) Obs want 1 VAR=coalescec(strip(DECOD),strip(TEXT)) 2 VAR=coalescec(strip(DECOD),strip(TEXT),"NOT REPORTED") 3 VAR=coalescec("NOT REPORTED",strip(DECOD),"Zero data",strip(TEXT)) 4 VAR=coalescec(strip(TEXT),"Not Reported,NOW",strip(DECOD))
But notice that the resulting code looks a little strange. Once you have a fixed text string, like in 3rd and 4th observations there isn't really any point in having the other arguments to COALESCEC().
data have; length x $200; x="(values=[DECOD] [TEXT])"; output; x='(values=[DECOD] [TEXT] "NOT REPORTED")'; output; x='(values= "NOT REPORTED" [DECOD] "Zero data" [TEXT] )';output; x='(values= [TEXT] "Not Reported,NOW" [DECOD] )';output; ; run; data want; set have; length y $200; y= prxchange('s/\[/strip(/',-1, x); y= prxchange('s/\]/),/',-1, y); y=prxchange('s/\(values=/VAR=COALESCEC(/',-1, y); y=prxchange('s/" s/",s/',-1, y); y=prxchange('s/,\)/)/',-1, y); y=prxchange('s/, \)/)/',-1, y); y=prxchange('s/, \)/)/',-1, y); run;
Hope can help you
data want;
set have;
length y $200;
y= prxchange('s/\[/strip(/',-1, x);
y= prxchange('s/\]/),/',-1, y);
y=prxchange('s/\(values=/VAR=COALESCE(/',-1, y);
y=prxchange('s/" *s/",s/',-1, y);
y=prxchange('s/, *\)/)/',-1, y);
run;
a little modification
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.