Solved: name variable in Data Step according to number or value in array

viuf · Posted 01-19-2013 07:00 AM

Hi

I'm new to Data Steps because I have always been using Proc Sql instead. This time I just need to look into it and therefore I hope that someone in here are able to help

I need to derive words and the first value from a text string / variable in a table. The string contains data that looks something like this:

Large text in variable (TXKLIGK):

Housing 300.000 310.000 320.000

Car 50.000 51.000 52.000

etc. (100 other posts)

Every line is 80 characters and the start of the words are always identical and so is the end of every value. In this Case I need "Housing" and "300.000".

I have made some code that through substrings and so on can "cut out" the desired values, so that is ok. My problem is that I need a way to make a list of words (like Housing) to look up within the text string and then loop throuh these words and make a new column for each word, where the header is the "word" I look for and the value is the number:

New table:

Name Housing Car etc...

Thomas 300.000 50.000 etc...

I have tried to loop through the array "Values" så I find all the values, but at the same time I need the variable "Value" to change name according to the value in the array, so I can identify the value, but it doesn't seem to work if I e.g. writes value&i = substr(sub, position_var, length); or value&Values(i) = substr(sub, position_var, length);

data test1

data test1;

set Main;

array Values {4} $30.

(

"Housing"

"Car"

"Power"

"Tv"

) ;

do i=1 to 4;

Values_string =''||"/"||Values(i)||"/"||'';

position=prxmatch(Values_string, TXLIKG);

if position ^= 0 then

do;

sub = substr(TXLIKG, position, 45);

pattern= "([0-9\.\,]+)";

patternID=prxparse(pattern);

call prxsubstr(patternID, sub, position_var, length);

if position_var ^= 0 then

do;

value = substr(sub, position_var, length);

end;

run;

I do hope it makes sense and thank you all in advance for any help.

Regards

Thomas

art297 · Posted 01-19-2013 11:11 AM

I would take a much simpler approach to initially reading the data. How about something like:

data want (keep=name val:);

set dataset;

format by_line $80.;

format name $34.;

do i=0 to int(length(TXLIKG)/80);

by_line=substr(TXLIKG,i*80+1,80);

if strip(by_line) ne "" then do;

name=strip(substr(by_line,1,34));

val1=input(substr(by_line,35,11),12.);

val2=input(substr(by_line,46,11),12.);

val3=input(substr(by_line,57,11),12.);

val4=input(substr(by_line,68,11),12.);

output;

end;

run;

View solution in original post

Jagadishkatam · Posted 01-19-2013 07:31 AM

Hi Thomas,

I assume you have not tried proc transpose. i tried it and believe that it is producing the same output what you desire. please check the below code and let me know if it is creating the desired output.

data have;

input name $ val1 val2 val3;

format val1 val2 val3 8.3;

cards;

Housing 300.000 310.000 320.000

Car 50.000 51.000 52.000

;

proc transpose data=have name=name out=want;

id name;

run;

Thanks,

Jagadish

Thanks,
Jag

viuf · Posted 01-19-2013 07:45 AM

Thanks for your answer...

I think that proc transpose will be the one to use when I have alle the variables. My problem right now is that I don't have the variables yet. They are all "trapped" within the same big text field . I can extract one value (housing) with the related (value) , but I can't loop through the list

array Values {4} $30.

(

"Housing"

"Car"

"Power"

"Tv"

) ;

and extract them all because I can't change the name of the variable to store them in. It only saves the last one, the rest are overwritten.... therefore I would like to know if I within a data step can name variables dynamicly e.g. like value&i = substr(sub, position_var, length);

Thanks

Jagadishkatam · Posted 01-19-2013 07:50 AM

Thank you Thomas for your reply. As per your email i got an idea of what your require but did not get it completely.

Could you please send 10 lines of the raw data on which you want to work. This will help me to understand the query and try to provide a better solution.

Thanks for your understanding.

Regards,

Jagadish

Thanks,
Jag

viuf · Posted 01-19-2013 09:27 AM

Ok I'll try I have attatched a sas data set in a very simplyfied version with only one row. The code below matches the dataset.

What I need from the field TXLIKG in this one row is to look up the values in the array "Values". The values are:

"Indtægter", "Bolig" and "Kabel TV". I need to have the text returned and also the value of the first number after the text. In this case it should be:

Indtægter = 819.772

Bolig = 317.914

Kabel TV = 10.800

The code already returns the right value but only for the last value in the Array because it overwrites the other. I need to be able to rename the variable "Value" on the fly so I can have an unlimited number of variables in the array to look up.

** string + nrows ;

data test;

set Dataset;

array Values {3} $30.

(

"Indtægter"

"Bolig"

"Kabel TV"

) ;

do i=1 to 3;

Values_string =''||"/"||Values(i)||"/"||'';

position=prxmatch(Values_string, TXLIKG);

if position ^= 0 then

do;

sub = substr(TXLIKG, position, 45);

pattern= "([0-9\.\,]+)";

patternID=prxparse(pattern);

call prxsubstr(patternID, sub, position_var, length);

if position_var ^= 0 then

do;

value = substr(sub, position_var, length);

end;

run;

art297 · Posted 01-19-2013 11:11 AM

I would take a much simpler approach to initially reading the data. How about something like:

data want (keep=name val:);

set dataset;

format by_line $80.;

format name $34.;

do i=0 to int(length(TXLIKG)/80);

by_line=substr(TXLIKG,i*80+1,80);

if strip(by_line) ne "" then do;

name=strip(substr(by_line,1,34));

val1=input(substr(by_line,35,11),12.);

val2=input(substr(by_line,46,11),12.);

val3=input(substr(by_line,57,11),12.);

val4=input(substr(by_line,68,11),12.);

output;

end;

run;

Astounding · Posted 01-19-2013 05:45 PM

viuf,

Assuming that you can read in the data and break each line up into separate variables, the proposed PROC TRANSPOSE does the job admirable. You might end up having to sort and run the transpose BY NAME, but the real power here is the ID statement. It will figure out what variable names you need, and create them for you automatically. So the ID statement refers to the first variable, the one that takes on values like Housing, Car, etc. Name (such as Thomas) would be a separate variable that would come from some other portion of your long text string.

Good luck.

viuf · Posted 01-19-2013 06:14 PM

Hi all

It seems like Arthurs solution actually was very simply and excactly what I was looking for - Thank you!

Now I just need to do the transpose, but unfortunately I can see that my next problem will be a duplicate-issue. In the full dataset I will propably experience duplicates which will ruin the transpose. Is there a way to rename duplicates automaticly while (or just before) running the proc transpose?

Preferably the rename should be something like:

Table to be transposed:

Name val1

Housing 20000

Interestrate 10000

xxx yyy

Car 10000
Interestrate 2000

Result (based on group values):

Housing Housing-Interestrate xxx Car Car-Interestrate

20000 10000 yyy 10000 2000

or maybe (based on number):

Housing Interestrate1 xxx Car Interestrate2

20000 10000 yyy 10000 2000

Thanks again

art297 · Posted 01-19-2013 06:31 PM

: I would post the transpose question as a new thread, including an example of your now current dataset, and the results that you hope to achieve.

viuf · Posted 01-19-2013 06:34 PM

Your right, it makes sense. I'll do that...

name variable in Data Step according to number or value in array

Re: name variable in Data Step according to number or value in array

Re: name variable in Data Step according to number or value in array

Re: name variable in Data Step according to number or value in array

Re: name variable in Data Step according to number or value in array

Re: name variable in Data Step according to number or value in array

Re: name variable in Data Step according to number or value in array

Re: name variable in Data Step according to number or value in array

Re: name variable in Data Step according to number or value in array

Re: name variable in Data Step according to number or value in array

Re: name variable in Data Step according to number or value in array

Register Today!

SAS Training: Just a Click Away