Hello,
I have cleanup my log file to get something like that:
1 + %doit(/dwh_actuariat/sasdata/Data_Retention,2011,wu,prop,habi,fev,4V3098387)
the sas dataset wu_prop_prmfev2011 exist
this is an prop premium dataset
0 observations where found for the agreement number: 4V3098387 into the dataset wu_prop_prmfev2011
2 + %doit(/dwh_actuariat/sasdata/Data_Retention,2011,wu,prop,habi,fev,P34V3098387)
the sas dataset wu_prop_prmfev2011 exist
this is an prop premium dataset
1 observations where found for the agreement number: P34V3098387 into the dataset wu_prop_prmfev2011
so If we look at the above-mentioned example, it is the content of the variable text. What I would like to get is a dataset with two lines and four columns where in the first column we have the %doit statement, second column we have the sas dataset ... exits, third column this is a prop premium datasets and last column remarks = 1 observations where found for the agreement number: P34V3098387 into the dataset wu_prop_prmfev2011
How to do those manipulation ?
Use four INPUT statements:
data want;
infile "...." truncover;
length var1 var2 var3 var4 $80;
input var1 $80.;
input var2 $80.;
input var3 $80.;
input var4 $80.;
run;
Increase/decrease the lengths and informats as needed;
What "manipulation"? Just read it that way.
First let's create a file with the lines of text you posted so we have something we can program against.
filename text temp;
options parmcards=text;
parmcards4;
1 + %doit(/dwh_actuariat/sasdata/Data_Retention,2011,wu,prop,habi,fev,4V3098387)
the sas dataset wu_prop_prmfev2011 exist
this is an prop premium dataset
0 observations where found for the agreement number: 4V3098387 into the dataset wu_prop_prmfev2011
2 + %doit(/dwh_actuariat/sasdata/Data_Retention,2011,wu,prop,habi,fev,P34V3098387)
the sas dataset wu_prop_prmfev2011 exist
this is an prop premium dataset
1 observations where found for the agreement number: P34V3098387 into the dataset wu_prop_prmfev2011
;;;;
If you know the file consists of 4 lines per observation you could use the N= option of the INFILE statement and the # cursor motion operator of the INPUT statement. So perhaps something like this:
data want;
infile text n=4 truncover ;
input #1 @'+' doit_call $100.
#2 @'dataset' memname :$32. status :$8.
#3 comment $100.
#4 nobs @'agreement number:' agreement :$20.
;
run;
Result
@alepage wrote:
How do we do that if I have already the variable text into a SAS dataset and if I have done some cleanup to make sure that we have a structure of 4 lines
Depends what the dataset you have looks like. Share what you have in the data for a couple of observations. For example if you have a dataset named HAVE with four variables named LINE1 to LINE4 you could share example of what you have for one observations by creating a program like:
data have;
length line1-line4 $100;
line1=' + %xxx();';
line2='other stuff';
line3='third line stuff';
line4='last line stuff';
output;
run;
If you have those strings in four variables then you can use SCAN() to pull out words from the strings. You can use INPUT() function to convert strings that look like numbers into actual numbers. If you need to search the text for the location to pull out you can probably use FIND() or one of the PX... functions that supports the use for regular expressions.
Hello,
I have imported my log file into a sas dataset with only one variable, the variable text.
Then I did some cleanup into it in order to keep only the four observations sequence as below
1 + %doit(/dwh_actuariat/sasdata/Data_Retention,2011,wu,prop,habi,fev,4V3098387)
the sas dataset wu_prop_prmfev2011 exist
this is an prop premium dataset
0 observations where found for the agreement number: 4V3098387 into the dataset wu_prop_prmfev2011
2 + %doit(/dwh_actuariat/sasdata/Data_Retention,2011,wu,prop,habi,fev,P34V3098387)
the sas dataset wu_prop_prmfev2011 exist
this is an prop premium dataset
1 observations where found for the agreement number: P34V3098387 into the dataset wu_prop_prmfev2011
so for the first observation I have something starting with + %doit
second observation something starting with the sas dataset
third observation this is an
and last observation x observations where found...
this 4 observations sequence is repeted 311 times.
I want to create a dataset having 311 observations / lines with var1=first line var2=second line var3=thrid line and finally var4 = fourth line.
In summary, we start with a sas dataset having 1 variable = text with 1244 observations and I want to transform this information into a new dataset of 311 lines and 4 variable
How do we do that
To convert every four observations into one here are a couple of methods.
You could add a group variable and then transpose by that variable.
This will create a dataset with 5 variables. GROUP and TEXT1 to TEXT4.
data step1;
set have;
group + mod(_n_,4)=1 ;
run;
proc transpose data=step1 out=step2(drop=_name_) prefix=text;
by group;
var text;
run;
You could do the same thing in single data step by defining an array and putting a DO loop around the SET statement. So in this example we also add the GROUP variable. We also need to set the length of the new variables, which we can do when we define the array.
data step2;
group+1;
do i=1 to 4;
array t $200 text1-text4;
set have;
t[i]=text;
end;
drop i text;
run;
Let's read in your example data and try it.
data have;
infile text truncover;
input text $char200.;
run;
Results:
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.