Hi
I'm new to Data Steps because I have always been using Proc Sql instead. This time I just need to look into it and therefore I hope that someone in here are able to help
I need to derive words and the first value from a text string / variable in a table. The string contains data that looks something like this:
Large text in variable (TXKLIGK):
Housing 300.000 310.000 320.000
Car 50.000 51.000 52.000
etc. (100 other posts)
Every line is 80 characters and the start of the words are always identical and so is the end of every value. In this Case I need "Housing" and "300.000".
I have made some code that through substrings and so on can "cut out" the desired values, so that is ok. My problem is that I need a way to make a list of words (like Housing) to look up within the text string and then loop throuh these words and make a new column for each word, where the header is the "word" I look for and the value is the number:
New table:
Name Housing Car etc...
Thomas 300.000 50.000 etc...
I have tried to loop through the array "Values" så I find all the values, but at the same time I need the variable "Value" to change name according to the value in the array, so I can identify the value, but it doesn't seem to work if I e.g. writes value&i = substr(sub, position_var, length); or value&Values(i) = substr(sub, position_var, length);
data test1
data test1;
set Main;
array Values {4} $30.
(
"Housing"
"Car"
"Power"
"Tv"
) ;
do i=1 to 4;
Values_string =''||"/"||Values(i)||"/"||'';
position=prxmatch(Values_string, TXLIKG);
if position ^= 0 then
do;
sub = substr(TXLIKG, position, 45);
pattern= "([0-9\.\,]+)";
patternID=prxparse(pattern);
call prxsubstr(patternID, sub, position_var, length);
if position_var ^= 0 then
do;
value = substr(sub, position_var, length);
end;
end;
end;
run;
I do hope it makes sense and thank you all in advance for any help.
Regards
Thomas
I would take a much simpler approach to initially reading the data. How about something like:
data want (keep=name val:);
set dataset;
format by_line $80.;
format name $34.;
do i=0 to int(length(TXLIKG)/80);
by_line=substr(TXLIKG,i*80+1,80);
if strip(by_line) ne "" then do;
name=strip(substr(by_line,1,34));
val1=input(substr(by_line,35,11),12.);
val2=input(substr(by_line,46,11),12.);
val3=input(substr(by_line,57,11),12.);
val4=input(substr(by_line,68,11),12.);
output;
end;
end;
run;
Hi Thomas,
I assume you have not tried proc transpose. i tried it and believe that it is producing the same output what you desire. please check the below code and let me know if it is creating the desired output.
data have;
input name $ val1 val2 val3;
format val1 val2 val3 8.3;
cards;
Housing 300.000 310.000 320.000
Car 50.000 51.000 52.000
;
proc transpose data=have name=name out=want;
id name;
run;
Thanks,
Jagadish
Thanks for your answer...
I think that proc transpose will be the one to use when I have alle the variables. My problem right now is that I don't have the variables yet. They are all "trapped" within the same big text field . I can extract one value (housing) with the related (value) , but I can't loop through the list
array Values {4} $30.
(
"Housing"
"Car"
"Power"
"Tv"
) ;
and extract them all because I can't change the name of the variable to store them in. It only saves the last one, the rest are overwritten.... therefore I would like to know if I within a data step can name variables dynamicly e.g. like value&i = substr(sub, position_var, length);
Thanks
Thank you Thomas for your reply. As per your email i got an idea of what your require but did not get it completely.
Could you please send 10 lines of the raw data on which you want to work. This will help me to understand the query and try to provide a better solution.
Thanks for your understanding.
Regards,
Jagadish
Ok I'll try I have attatched a sas data set in a very simplyfied version with only one row. The code below matches the dataset.
What I need from the field TXLIKG in this one row is to look up the values in the array "Values". The values are:
"Indtægter", "Bolig" and "Kabel TV". I need to have the text returned and also the value of the first number after the text. In this case it should be:
Indtægter = 819.772
Bolig = 317.914
Kabel TV = 10.800
The code already returns the right value but only for the last value in the Array because it overwrites the other. I need to be able to rename the variable "Value" on the fly so I can have an unlimited number of variables in the array to look up.
** string + nrows ;
data test;
set Dataset;
array Values {3} $30.
(
"Indtægter"
"Bolig"
"Kabel TV"
) ;
do i=1 to 3;
Values_string =''||"/"||Values(i)||"/"||'';
position=prxmatch(Values_string, TXLIKG);
if position ^= 0 then
do;
sub = substr(TXLIKG, position, 45);
pattern= "([0-9\.\,]+)";
patternID=prxparse(pattern);
call prxsubstr(patternID, sub, position_var, length);
if position_var ^= 0 then
do;
value = substr(sub, position_var, length);
end;
end;
end;
run;
I would take a much simpler approach to initially reading the data. How about something like:
data want (keep=name val:);
set dataset;
format by_line $80.;
format name $34.;
do i=0 to int(length(TXLIKG)/80);
by_line=substr(TXLIKG,i*80+1,80);
if strip(by_line) ne "" then do;
name=strip(substr(by_line,1,34));
val1=input(substr(by_line,35,11),12.);
val2=input(substr(by_line,46,11),12.);
val3=input(substr(by_line,57,11),12.);
val4=input(substr(by_line,68,11),12.);
output;
end;
end;
run;
viuf,
Assuming that you can read in the data and break each line up into separate variables, the proposed PROC TRANSPOSE does the job admirable. You might end up having to sort and run the transpose BY NAME, but the real power here is the ID statement. It will figure out what variable names you need, and create them for you automatically. So the ID statement refers to the first variable, the one that takes on values like Housing, Car, etc. Name (such as Thomas) would be a separate variable that would come from some other portion of your long text string.
Good luck.
Hi all
It seems like Arthurs solution actually was very simply and excactly what I was looking for - Thank you!
Now I just need to do the transpose, but unfortunately I can see that my next problem will be a duplicate-issue. In the full dataset I will propably experience duplicates which will ruin the transpose. Is there a way to rename duplicates automaticly while (or just before) running the proc transpose?
Preferably the rename should be something like:
Table to be transposed:
Name val1
Housing 20000
Interestrate 10000
xxx yyy
Car 10000
Interestrate 2000
Result (based on group values):
Housing Housing-Interestrate xxx Car Car-Interestrate
20000 10000 yyy 10000 2000
or maybe (based on number):
Housing Interestrate1 xxx Car Interestrate2
20000 10000 yyy 10000 2000
Thanks again
Your right, it makes sense. I'll do that...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.