Hello,
I want to extract only string or characters from the variable.
my data looks like this:
data have1;
input id name$ gen$ ID DATE A B C D$;
datalines;
1 ram male 1 1.1 1 . 1.23456 1/n
2 mohan male 1 1.1 99 1 1.23456 1.1/n
3 shyam male 2 2.1 1 99 1.23456 16/dr
4 radha female 2 2.1 90 1 99 2.00/mn
5 gita female 2 2.1 99 95 1.23456 ?/prp
6 sita female 2 3.1 . 1 1.23456 10mg
7 ranu female 3 2.1 1 . 1.23456 1000mg
8 tina female 3 2.1 . 1 1.23456 n/q
9 shan male 4 8.1 1 1 1.6789 2:3
;
run;
I want data look like this:
data have1;
id name gen ID DATE A B C D E ;
1 ram male 1 1.1 1 . 1.23456 1 n
2 mohan male 1 1.1 99 1 1.23456 1 n
3 shyam male 2 2.1 1 99 1.23456 16 dr
4 radha female 2 2.1 90 1 99 2.00 mn
5 gita female 2 2.1 99 95 1.23456 ? prp
6 sita female 2 3.1 . 1 1.23456 10 mg
7 ranu female 3 2.1 1 . 1.23456 1000 mg
8 tina female 3 2.1 . 1 1.23456 nq
9 shan male 4 8.1 1 1 1.6789 2
I have tried following code
data want; set have1;
_numb= scan (D,1,'/','A');
run;
This removes all the string characters. But I need a separate variable with remaining characters (Also want to remove '?' from the dataset )
Kindly help.
Thanks in advance
data have1;
input id name $ gen $ ID DATE A B C _D $;
D=scan(_D,1,'/:','a');
E=compress(_D,'/:' || D,'d');
drop _D;
cards;
1 ram male 1 1.1 1 . 1.23456 1/n
2 mohan male 1 1.1 99 1 1.23456 1.1/n
3 shyam male 2 2.1 1 99 1.23456 16/dr
4 radha female 2 2.1 90 1 99 2.00/mn
5 gita female 2 2.1 99 95 1.23456 ?/prp
6 sita female 2 3.1 . 1 1.23456 10mg
7 ranu female 3 2.1 1 . 1.23456 1000mg
8 tina female 3 2.1 . 1 1.23456 n/q
9 shan male 4 8.1 1 1 1.6789 2:3
;
run;
Obs | id | name | gen | DATE | A | B | C | D | E |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | ram | male | 1.1 | 1 | . | 1.2346 | 1 | n |
2 | 1 | mohan | male | 1.1 | 99 | 1 | 1.2346 | 1.1 | n |
3 | 2 | shyam | male | 2.1 | 1 | 99 | 1.2346 | 16 | dr |
4 | 2 | radha | female | 2.1 | 90 | 1 | 99.0000 | 2.00 | mn |
5 | 2 | gita | female | 2.1 | 99 | 95 | 1.2346 | ? | prp |
6 | 2 | sita | female | 3.1 | . | 1 | 1.2346 | 10 | mg |
7 | 3 | ranu | female | 2.1 | 1 | . | 1.2346 | 1000 | mg |
8 | 3 | tina | female | 2.1 | . | 1 | 1.2346 | nq | |
9 | 4 | shan | male | 8.1 | 1 | 1 | 1.6789 | 2 |
Should the last record be
9 shan male 4 8.1 1 1 1.6789 2 3
or does the : instead of the / mean disregard what is after?
If you are looking to get rid of ?'s what should this records look like?
5 gita female 2 2.1 99 95 1.23456 ? prp
Should it be
5 gita female 2 2.1 99 95 1.23456 prp
I want to separate numbers and characters as sometimes string characters are present before / also
Use scan():
data want; set have1; e=scan(d,2,"/"); d=scan(d,1,"/"); run;
I don't think that will split 1000mg into 1000 and mg.
In which case you need to two logical operations, one for delimited, one for not:
data want; set have1;
if index(d,"/") then do;
e=scan(d,2,"/"); d=scan(d,1,"/");
end;
else do;
e=compress(d," ","d");
d=tranwrd(d,e,"");
end; run;
So first, if a / appears use that, else compress out all numbers and set that as e, then remove e from d.
Thanks for the reply. I have tried scan() function, but it does not work for those instances where '/ ' is not there such as row number 6,7,and 9. As It removes all the characters instead.
data have1;
input id name$ gen$ ID DATE A B C D$;
datalines;
1 ram male 1 1.1 1 . 1.23456 1/n
2 mohan male 1 1.1 99 1 1.23456 1.1/n
3 shyam male 2 2.1 1 99 1.23456 16/dr
4 radha female 2 2.1 90 1 99 2.00/mn
5 gita female 2 2.1 99 95 1.23456 ?/prp
6 sita female 2 3.1 . 1 1.23456 10mg
7 ranu female 3 2.1 1 . 1.23456 1000mg
8 tina female 3 2.1 . 1 1.23456 n/q
9 shan male 4 8.1 1 1 1.6789 2:3
;
run;
data want;
set have1;
d1=compress(d,' ','kd');
e=compress(d,'?/','d');
run;
data have1;
input id name $ gen $ ID DATE A B C _D $;
D=scan(_D,1,'/:','a');
E=compress(_D,'/:' || D,'d');
drop _D;
cards;
1 ram male 1 1.1 1 . 1.23456 1/n
2 mohan male 1 1.1 99 1 1.23456 1.1/n
3 shyam male 2 2.1 1 99 1.23456 16/dr
4 radha female 2 2.1 90 1 99 2.00/mn
5 gita female 2 2.1 99 95 1.23456 ?/prp
6 sita female 2 3.1 . 1 1.23456 10mg
7 ranu female 3 2.1 1 . 1.23456 1000mg
8 tina female 3 2.1 . 1 1.23456 n/q
9 shan male 4 8.1 1 1 1.6789 2:3
;
run;
Obs | id | name | gen | DATE | A | B | C | D | E |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | ram | male | 1.1 | 1 | . | 1.2346 | 1 | n |
2 | 1 | mohan | male | 1.1 | 99 | 1 | 1.2346 | 1.1 | n |
3 | 2 | shyam | male | 2.1 | 1 | 99 | 1.2346 | 16 | dr |
4 | 2 | radha | female | 2.1 | 90 | 1 | 99.0000 | 2.00 | mn |
5 | 2 | gita | female | 2.1 | 99 | 95 | 1.2346 | ? | prp |
6 | 2 | sita | female | 3.1 | . | 1 | 1.2346 | 10 | mg |
7 | 3 | ranu | female | 2.1 | 1 | . | 1.2346 | 1000 | mg |
8 | 3 | tina | female | 2.1 | . | 1 | 1.2346 | nq | |
9 | 4 | shan | male | 8.1 | 1 | 1 | 1.6789 | 2 |
I think making that
'/:?'
gets OP to the desired result (dropping questions marks).
Although I do think 16/nq becoming 16 and nq while n/q becomes " " and nq is a little strange.
I agree, but it is what the OP showed as the desired output.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.