Your SAS programs, embedded in web apps and elsewhere

Reading Raw Hierarchical data files

Reply
N/A
Posts: 0

Reading Raw Hierarchical data files

Ok my problem is I want to read a hierarchical raw data set like this one:

bobby 213
m1 5 7 9 10
colin 442
m1 7 3 10 5
m2 6 3 7 4
tracy 856
m2 7 8 6 7
cindy 365
m1 7 9 4 5
m2 3 9 8 4

and for SAS to output a result like this:

id name mark1 mark2 total
213 bobby 31 0 31
442 colin 25 20 45
856 tracy 0 28 28
365 cindy 25 24 49

mark1 is total of marks in m1
mark2 is total of marks in m2
total is equal to the sum of mark1 and mark2

this is my code so far:

proc fslist fileref='marks.txt';
run;
data marks (drop=Type q1 q2 q3 q4);
/* Drop inputs Type question1 = q1 - question4 = q4 */
retain id name q1 q2 q3 q4 mark1 mark2;
length id $3. name $10. Type $2. q1 3. q2 3. q3 3. q4 3.;
/* Keeping the length of all inputs for neatness */

infile 'marks.txt' end=LastRec;

input Type $@;

if Type='id' then do;
if _N_ > 1 then output;
input id $ name $; end;

else do;

input q1 q2 q3 q4;

if (Type = 'm1') and (Type ^= 'm2') and (Type ^= '.') then
mark1 = sum(q1,q2,q3,q4);
*total = mark1;

else if (Type ^= 'm1') and (Type = 'm2') and (Type ^= '.') then
mark2 = sum(q1,q2,q3,q4);
*total = mark2;

/*
if (Type = 'm1') and (Type ^= 'm2') then
mark1 = sum(q1,q2,q3,q4);
*/


/*
else if (Type = 'm2') and (Type ^= 'm1') then
mark2 = sum(q1,q2,q3,q4);
*/


/*
else if (Type = 'm1') and (Type = 'm2')
mark1 = sum(q1,q2,q3,q4);
mark2 = sum(q1,q2,q3,q4);
*/

end;
if LastRec then output;
run;
ods html file='marks.html';
proc print data=marks noobs;
title 'Marks';
run;
ods html close;


so far my output looks like this:

id name mark1 mark2
213 bobby 31 .
442 colin 25 20
856 tracy 25 28
365 cindy 25 24

as you can see id 856 should not have a mark for m1

Sorry for such a long question but I couldn't find any SAS forum on this topic and in manipulating data files like this. Yeah out of ideas on how I should fix this code to
reflect the output.

Thanks to whoever can help
SAS Super FREQ
Posts: 8,740

Re: Reading Raw Hierarchical data files

Hi:
When I make a copy of your data and run the program, I get this:
[pre]
NOTE: The infile 'c:\temp\marks.txt' is:
File Name=c:\temp\marks.txt,
RECFM=V,LRECL=256

NOTE: Invalid data for q2 in line 2 1-2.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
2 m1 5 7 9 10 11
id= name= q1=213 q2=. q3=5 q4=7 mark1=. mark2=. Type=bo LastRec=0 _ERROR_=1 _N_=1
NOTE: Invalid data for q2 in line 4 1-2.
4 m1 7 3 10 5 11
id= name= q1=442 q2=. q3=7 q4=3 mark1=. mark2=. Type=co LastRec=0 _ERROR_=1 _N_=2
NOTE: Invalid data for q2 in line 7 1-2.
7 m2 7 8 6 7 10
id= name= q1=856 q2=. q3=7 q4=8 mark1=. mark2=20 Type=tr LastRec=0 _ERROR_=1 _N_=4
NOTE: Invalid data for q2 in line 9 1-2.
9 m1 7 9 4 5 10
id= name= q1=365 q2=. q3=7 q4=9 mark1=. mark2=20 Type=ci LastRec=0 _ERROR_=1 _N_=5
NOTE: 10 records were read from the infile 'c:\temp\marks.txt'.
The minimum record length was 9.
The maximum record length was 11.
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.MARKS has 1 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


[/pre]

Which is understandable. If your data REALLY does look like this:
[pre]
bobby 213
m1 5 7 9 10
colin 442
m1 7 3 10 5
m2 6 3 7 4
tracy 856
m2 7 8 6 7
cindy 365
m1 7 9 4 5
m2 3 9 8 4
[/pre]

Then this test NEVER is true: [pre] if Type='id' then do; [/pre]
There will NEVER be a value for TYPE = 'id', unless your data really looks like this:
[pre]
id bobby 213
m1 5 7 9 10
id colin 442
m1 7 3 10 5
m2 6 3 7 4
id tracy 856
m2 7 8 6 7
id cindy 365
m1 7 9 4 5
m2 3 9 8 4
[/pre]

Even then, your INPUT statement is wrong because you have:
[pre]
input id $ name $;
[/pre]

BUT in the data you show, NAME is first and ID value is second.

There is a way to read the data, as you show it in your post. However, you have some other issues in your program. It seems that you might also have problems with the DO/END for your IF statement.

The way SAS IF statements work is this:
[pre]
if Type = 'm1' then mark1 = sum(q1,q2,q3,q4);

OR

if LastRec then output;
[/pre]
are both good without a DO/END because you are only executing ONE statement; however, if you want to execute MORE than ONE statement as a result of a true condition, then you have to do this:
[pre]
if Type = 'm1' then do;
mark1 = sum(q1,q2,q3,q4);
total + mark1;
end;
else if Type = 'm2' then do;
mark2 = sum(q1,q2,q3,q4);
total + mark2;
end;
[/pre]

The IF statement needs a DO/END because 2 statements are being executed when Type = 'm1'. The ELSE needs a DO/END because 2 statements are being executed for that condition, as well. I'm one of those people who almost ALWAYS codes a DO/END, anyway... even if I only want to execute one statement.
So, if I wasn't going to calculate TOTAL, I'd still do this:
[pre]
if Type = 'm1' then do;
mark1 = sum(q1,q2,q3,q4);
end;
else if Type = 'm2' then do;
mark2 = sum(q1,q2,q3,q4);
end;

[/pre]
Because it helps me see the structure of my conditions and then if I do need to go back and put another action into one or the other of the conditions, I only have to add my statement, I don't have to worry about where the END goes because I already have the logic structure in the program.

As for your issue of ID 856 being wrong....that's because you are RETAINING the values for
[pre]
retain id name q1 q2 q3 q4 mark1 mark2;
[/pre]
which means that unless you reset them when you do your OUTPUT, what's in the PDV is the information from the PREVIOUS record. Since you retained the information, it is up to your to "reset" or "reinitialize" the PDV:
[pre]
if _N_ > 1 then do;
output;
** when on a name record, after you output the previous record, you;
** need to "reset" or "erase" or "reinitialize" retained values in PDV;
** because these values are left over from previous record (due to retain);
q1 = .;
q2 = .;
q3 = .;
q4 = .;
mark1 = .;
mark2 = .;
total = .;
end;
[/pre]

The reason there is no forum for these kinds of questions is that these are the kinds of questions that are best answered by Tech Support. Your question is very specific to your data -- and in this case, the data you posted doesn't quite match the program you posted. Tech Support has the resources to look at your actual data file and to help you debug how to read the file in the most accurate and efficient manner.

To find out how to open a track with Tech Support, go to: http://support.sas.com and in the navigation area on the left, you will see a link that says "Submit a Problem". If you click on that link, you will be able to submit your problem to Tech Support.

cynthia
N/A
Posts: 0

Re: Reading Raw Hierarchical data files

Thanks Cynthia

apologies my data looks like this

id 263 bobby
a1 5 7 9 10
a2 6 3 7 4
id 432 colin
a1 7 3 10 5
a2 6 3 7 4
id 956 tracy
a1 7 3 10 5
a2 7 8 6 7
id 765 cindy
a1 7 9 5 4
a2 3 9 8 4

so it's no wonder the program went cahoot.

however, I read your help and yes I didn't "reset" or "reinitialize" the PDV
so from this I fixed my code to reflect this and syntax errors of the if else then do cases .What did you know, this baby worked like a charm.

Thank you so much Cynthia for your help, that "reset" or "reinitialize" really helped me
out otherwise I would have wasted time trying to figure it out.

-S
you know, worked like a charm.
Ask a Question
Discussion stats
  • 2 replies
  • 188 views
  • 0 likes
  • 2 in conversation