BookmarkSubscribeRSS Feed
mdavidson
Quartz | Level 8

I have a strange dataset that I need to loop through around 4,000 columns and rename them based on the first word in the column's value. For example my data looks like this:

 

data HAVE;
F1='CUSTOMERID="7895"';
F2='CUSTOMERNAME="JOHN DOE"';
F3='CUSTOMERADDRESS="1234 MAIN ST"';
output;
F1='CUSTOMERID="8597"';
F2='CUSTOMERNAME="DAVE ABC"';
F3='CUSTOMERADDRESS="6789 RIVER ST"';
output;
run;

data WANT;
CUSTOMERID='7895';
CUSTOMERNAME='JOHN DOE';
CUSTOMERADDRESS='1234 MAIN ST';
output;
CUSTOMERID='8597';
CUSTOMERNAME='DAVE ABC';
CUSTOMERADDRESS='6789 RIVER ST';
output;
run;

I'm not exactly sure what is the best way to loop through each column, grab the text to the left of the "=" sign and make that the column name. Any ideas? 

 

Thanks.

11 REPLIES 11
Reeza
Super User

What is the original source data type?

How did it get read in originally? Can that step be corrected?

 

SAS has NAMED input which allows for this type of data to be read easily from text files and you can use it within a data step if required, but it would help to know the entire situation. 

 


@mdavidson wrote:

I have a strange dataset that I need to loop through around 4,000 columns and rename them based on the first word in the column's value. For example my data looks like this:

 

data HAVE;
F1='CUSTOMERID="7895"';
F2='CUSTOMERNAME="JOHN DOE"';
F3='CUSTOMERADDRESS="1234 MAIN ST"';
output;
F1='CUSTOMERID="8597"';
F2='CUSTOMERNAME="DAVE ABC"';
F3='CUSTOMERADDRESS="6789 RIVER ST"';
output;
run;

data WANT;
CUSTOMERID='7895';
CUSTOMERNAME='JOHN DOE';
CUSTOMERADDRESS='1234 MAIN ST';
output;
CUSTOMERID='8597';
CUSTOMERNAME='DAVE ABC';
CUSTOMERADDRESS='6789 RIVER ST';
output;
run;

I'm not exactly sure what is the best way to loop through each column, grab the text to the left of the "=" sign and make that the column name. Any ideas? 

 

Thanks.


 

mdavidson
Quartz | Level 8
Tried to correct the original import, that was the first thing I looked at in this case. Unfortunately it's more complicated than I ever wish to type out, huge XML file, no map, along with about zillion other problems. Long story short, I cannot change the input process the data example that I gave is what I have to work with.
Reeza
Super User

XML mapper didn't work. Personally with XML files, I've resorted to name parsing. SCAN() is quite a powerful function with the = delimiter as long as you have a semi clear structure. 

 

But, we really do need to understand the scope of the problem. I can easily parse the above with SCAN() but it may not generalize to your problem. 

 

Or maybe SCAN() is all you need to move ahead.

 

 

mdavidson
Quartz | Level 8
XML mapper did not work in this case, it wasn't even able to read in the file. Do you have an example of what you're thinking with SCAN? I'm not that familiar with SCAN.
Reeza
Super User

Given what you've posted this will work:

 

data want;
set have;
CustomerID = scan(f1, 2, '=');
customerName = scan(f2, 2, '=');
customerAddress = scan(f3, 2, '=');

run;

However, I don't think it will generalize to your whole file. 

 

When I had to do this my code ended up looking like this:

 


if find (x, "<corporation corporationId=")> 0 then do;
    field="CorporateID";
    value =scan(x, 2, '"');
    corporateID=input(value, 8.);
    output;
end;
else if find (x, "<annualReturn annualMeetingDate=")> 0 then do;
    field='AnnualMeetingDate';
    value = scan(x, 2, '"');
    output;
    field = "YearOfFiling";
    value = scan(x, 4, '"');
    output;
end;
else if find (x, "act code=")> 0 then do;
    field='Act';
    value = scan(x, 2, '"');
    output;
end;
else if find (x, "status code=")> 0 then do;
    field='StatusCode';
    value = scan(x, 2, '"');
    output;
end;
else if find (x, "<activity code=")> 0 then do;
    field='ActivityCode';
    value = scan(x, 2, '"');
    output;
    field='ActivityDate';
    value = scan(x, 4, '"');
    output;
end;
else if find (x, "<name code=")> 0 and find(x, "current")=0 then do;
    field='NameCode';
    value = scan(x, 2, '"');
    output;
    field='EffectiveDate';
    value = scan(x, 4, '"');
    output;
    field='ExpiryDate';
    value = scan(x, 6, '"');
    output;
    field='CompanyName';
    value = scan(x, 3, '<>');
    output;
end;

I created a long file instead so I didn't have to care about variable names. 

 

data want2;
set have;

array f(*) f1-f3;

do i=1 to dim(f);

Variable = scan(f(i), 1, "=");
Value = scan(f(i), 2, "=");
output;
end;

run;

Then once that was imported, I transposed and combined files based on my understanding of the data.

 


mdavidson
Quartz | Level 8

Good examples, thank you but I'm not sure they will work. I actually have over 4,000 columns, so I'd like to dynamically rename the column names if possible.

novinosrin
Tourmaline | Level 20
data HAVE;
F1='CUSTOMERID="7895"';
F2='CUSTOMERNAME="JOHN DOE"';
F3='CUSTOMERADDRESS="1234 MAIN ST"';
output;
F1='CUSTOMERID="8597"';
F2='CUSTOMERNAME="DAVE ABC"';
F3='CUSTOMERADDRESS="6789 RIVER ST"';
output;
run;

data temp;
set have;
array ff f1-f3 ;
grp=_n_;
do over ff ;
vname=scan(ff(_i_),1,'=');
value=dequote(scan(ff(_i_),-1,'='));
output;
end;
run;

proc transpose data=temp out=want(drop=_: grp);
by grp;
var value;
id vname;
run;

Please specify a length for vname and value accordingly coz it doesn't matter what length I assign on my college comp as opposed to your production environment

novinosrin
Tourmaline | Level 20

Cleaner with assigned length

 

data HAVE;
length f1-f3 $50;
F1='CUSTOMERID="7895"';
F2='CUSTOMERNAME="JOHN DOE"';
F3='CUSTOMERADDRESS="1234 MAIN ST"';
output;
F1='CUSTOMERID="8597"';
F2='CUSTOMERNAME="DAVE ABC"';
F3='CUSTOMERADDRESS="6789 RIVER ST"';
output;
run;

data temp;
set have;
length vname value $50;
array ff f1-f3 ;
grp=_n_;
do over ff ;
vname=scan(ff(_i_),1,'=');
value=dequote(scan(ff(_i_),-1,'='));
output;
end;
run;

proc transpose data=temp out=want(drop=_: grp);
by grp;
var value;
id vname;
run;
Reeza
Super User

@mdavidson wrote:

Good examples, thank you but I'm not sure they will work. I actually have over 4,000 columns, so I'd like to dynamically rename the column names if possible.


That's what the last option supports.

ballardw
Super User

@mdavidson wrote:
Tried to correct the original import, that was the first thing I looked at in this case. Unfortunately it's more complicated than I ever wish to type out, huge XML file, no map, along with about zillion other problems. Long story short, I cannot change the input process the data example that I gave is what I have to work with.

Have you tried opening the XML it something such as Excel. Sometimes the xml can be resolved to a reasonable result and then use the Excel file>save as> csv to create something nicer for import.

Astounding
PROC Star

I would suggest a macro language approach.  For example:

 

data _null_;

set have;

array allvars {4000} f1-f4000;

do k=1 to 4000;

   call symputx(vname(allvars{k}), scan(allvars{k}, 1, '='));

end;

stop;   /* just getting the proper variable names here, no need to process each observation in the data */

run;

 

This gets you a set of 4000 macro variables, as if you had coded:

 

%let F1 = CUSTOMERID;

%let F2 = CUSTOMERNAME;

%let F3 = CUSTOMERADDRESS;

 

Then write a macro to apply those macro variables:

 

%macro allvars;

data want;

set have;

%do k=1 %to 4000;

   &&f&k = scan(f&k, 2, '=');

%end;

drop f1-f4000;

run;

%mend allvars;

 

%allvars

 

It's untested code, so might need a little tweaking.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2800 views
  • 4 likes
  • 5 in conversation