I was given an excel file where someone stored all of the information in a single column (var1). I need to pull information but it will be in random orders. Good thing is the person gave the information and then put a period after it.
Example of Var1:
Type = 2. Size = 4 in x 12 in. Group = ABC grouping.
Group = A and B Holdings. Type = 1.
Group = Mark H and Company.
The variable I need to pull is group. It always starts with "Group = " and has a period in the end. But will be anywhere within the var1 (so you can't name a specific period. Sometimes it may not exist. This variable can be any length in words. I just need to pull the string between "Group = " and the period.
This can't be done in excel due to the size of the dataset.
I have tried scan, find, splitting at the period, and I am not sure what to do at this point to organize it.
Try this:
data have;
infile cards4 dlm = "0A0D"x;
input Var1 : $ 100.;
cards4;
Type = 2. Size = 4 in x 12 in. Group = ABC grouping.
Group = A and B Holdings. Type = 1.
Group = Mark H and Company.
;;;;
run;
proc print;
run;
data want;
set have;
do _N_ = 1 to countw(Var1,".");
part = strip(scan(Var1,_N_,".")); drop part;
if upcase(part) =: "GROUP" then
do;
group = strip(substr(part, index(part, "=")+1));
end;
end;
run;
proc print;
run;
Bart
You could read the data using NAMED Input.
data have;
infile cards4;
length type size group $48;
input @;
_infile_= transtrn(_infile_,' =','=');
input (_all_)(=);
list;
cards4;
Type = 2. Size = 4 in x 12 in. Group = ABC grouping.
Group = A and B Holdings. Type = 1.
Group = Mark H and Company.
;;;;
run;
proc print;
run;
data have;
infile cards4 truncover;
input Var1 $ 100.;
cards4;
Type = 2. Size = 4 in x 12 in. Group = ABC grouping.
Group = A and B Holdings. Type = 1.
Group = Mark H and Company.
;;;;
run;
data want;
set have;
pid=prxparse('/(?<=group)\s*=[^\.]+(?=\.)/io');
call prxsubstr(pid,var1,p,l);
if p then want=scan(strip(substr(var1,p,l)),1,'=');
drop pid p l;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.