Hey everyone,
trying to rearrange some data using an array. I can't figure out what I'm doing wrong here, not getting any errors in my log. Only some of the observations are printing.
options nonumber nodate;
data Project2;
input x1 - x6;
if _n_ in (1,2,3,4,5,6) then district = 'OKEECHOBEE';
IF _N_ IN (7,8,9,10,11,12) THEN DISTRICT= 'SEMINOLE';
IF _N_ IN (13,14,15,16,17,18) THEN DISTRICT= 'VOLUSIA';
ELSE DISTRICT= 'WALTON';
if _n_ in (1,2,7,8,13,14,19,20) then race= 'B';
if _n_ in (3,4,9,10,15,16,21,22) then race= 'W';
else race= 'O';
if _n_ in (1,3,5,7,9,11,13,15,17,19,21,23) then gender= 'M';
else gender= 'F';
array pr2(6) x1-x6;
do i= 1 to 6;
if i < 4 then learntype= 'Passive';
else leanrtype= 'Active';
if i in (1,4) then level= 'Elementary';
if i in (2,5) then level= 'Middle';
else level= 'High';
absenses= pr2(6);
end;
drop i;
datalines;
100 300 1600 750 250 700
150 1150 2350 450 550 850
3350 850 1800 750 600 150
1250 350 700 400 1650 700
1650 700 1150 850 250 1100
200 1600 3700 450 600 750
3800 650 1750 700 550 250
1050 400 650 350 2200 1050
1550 750 1100 900 600 1250
250 1650 2700 450 650 450
2800 1150 1550 650 800 300
600 450 1150 450 2050 1100
350 350 1650 650 200 750
100 950 1350 500 600 900
1700 550 1550 850 750 50
1200 300 850 200 1600 750
1600 3700 10 400 350 2000
335 1750 1550 2100 700 100
480 650 550 650 1600 800
220 1100 950 1600 650 50
1640 2700 500 450 400 1450
1185 1550 1900 2150 750 50
100 300 1600 750 250 700
150 1150 2350 450 550 850
;
proc print data=Project2;
title 'Absenteeism';
VAR district race gender learntype level absenses;
run;
@BWDAWG wrote:
Hey everyone,
trying to rearrange some data using an array. I can't figure out what I'm doing wrong here, not getting any errors in my log. Only some of the observations are printing.
options nonumber nodate;
data Project2;
input x1 - x6;if _n_ in (1,2,3,4,5,6) then district = 'OKEECHOBEE';
IF _N_ IN (7,8,9,10,11,12) THEN DISTRICT= 'SEMINOLE';
IF _N_ IN (13,14,15,16,17,18) THEN DISTRICT= 'VOLUSIA';
ELSE DISTRICT= 'WALTON';
if _n_ in (1,2,7,8,13,14,19,20) then race= 'B';
if _n_ in (3,4,9,10,15,16,21,22) then race= 'W';
else race= 'O';if _n_ in (1,3,5,7,9,11,13,15,17,19,21,23) then gender= 'M';
else gender= 'F';array pr2(6) x1-x6;
do i= 1 to 6;
if i < 4 then learntype= 'Passive';
else leanrtype= 'Active';
if i in (1,4) then level= 'Elementary';
if i in (2,5) then level= 'Middle';
else level= 'High';
absenses= pr2(6);
end;
drop i;
datalines;
100 300 1600 750 250 700
150 1150 2350 450 550 850
3350 850 1800 750 600 150
1250 350 700 400 1650 700
1650 700 1150 850 250 1100
200 1600 3700 450 600 750
3800 650 1750 700 550 250
1050 400 650 350 2200 1050
1550 750 1100 900 600 1250
250 1650 2700 450 650 450
2800 1150 1550 650 800 300
600 450 1150 450 2050 1100
350 350 1650 650 200 750
100 950 1350 500 600 900
1700 550 1550 850 750 50
1200 300 850 200 1600 750
1600 3700 10 400 350 2000
335 1750 1550 2100 700 100
480 650 550 650 1600 800
220 1100 950 1600 650 50
1640 2700 500 450 400 1450
1185 1550 1900 2150 750 50
100 300 1600 750 250 700
150 1150 2350 450 550 850;
proc print data=Project2;
title 'Absenteeism';
VAR district race gender learntype level absenses;
run;
level will always be 'High', as you are missing a crucial else. The same happens with race, which will never be 'B'. And district will only be 'VOLUSIA' or 'WALTON'. Consider using a select() block instead of if/then/else if chains.
Are you getting less than 24 observations in your output?
Thanks for the reply @Kurt_Bremser. So I made a couple of changes by adding else if statements, but I still am only getting 24 observations where I should have 144. problem seems to be the variables level and learntype.
Why would you expect more than 24 observations? With no explicit output statement, the data step will write once at the end of each iteration, and there will only be 24 iterations for the 24 datalines. If you want more in the resulting dataset, you have to add an output statement in the do loop.
Hi @BWDAWG and welcome to the SAS Support Communities!
If I look at your code and guess what you intended to do (your description "rearrange some data" is a bit vague), my first recommendations are:
@FreelanceReinhthanks for the reply, that was a great explanation, very helpful. I fixed everything you mentioned except the length statement bc I am not seeing any truncation. However something still isn't working correctly. Now it is printing out all 144 observations, but the absenses are not correct, it seems to be assigning the last value in the datalines to all the variables? not sure if i explained that correctly. here is what i updated
@BWDAWG wrote:
I fixed everything you mentioned except the length statement bc I am not seeing any truncation.
You were just lucky that no truncation occurred. Truncation would have occurred if the first value assigned to, e.g., DISTRICT, LEARNTYPE and LEVEL had not happened to be the longest value. Imagine that someone told you in a few weeks to alter the value of variable LEVEL where it was 'Middle' to 'Intermediate'. Would you then really remember that without additional changes to the code (such as a LENGTH statement) the new values would be truncated to 'Intermedia'?
If you want the first, second, ..., sixth value, i.e., the values of x1, ..., x6, to occur in the new variable ABSENSES, pr2[i] (in place of pr2(6)) would achieve this.
Looks like the first three variables generate 4*3*2=24 combinations, one per row in the file.
And the last two variables generate 2*3=6 combinations, one per each value in a row.
Why not just let the DO loop take care of the values.
Looks like the values are in this order. But perhaps you will need to reorder the DO statements to match the pattern used to create the source data file.
data Project2;
length district $20 race $1 gender $1 learntype $10 level $10;
do district='OKEECHOBEE','SEMINOLE','VOLUSIA','WALTON';
do race = 'B','W','O';
do gender='M','F' ;
do learntype= 'Passive','Active';
do level= 'Elementary','Middle','High';
input absenses @ ;
output;
end;
end;
end;
end;
end;
datalines;
100 300 1600 750 250 700
150 1150 2350 450 550 850
3350 850 1800 750 600 150
1250 350 700 400 1650 700
1650 700 1150 850 250 1100
200 1600 3700 450 600 750
3800 650 1750 700 550 250
1050 400 650 350 2200 1050
1550 750 1100 900 600 1250
250 1650 2700 450 650 450
2800 1150 1550 650 800 300
600 450 1150 450 2050 1100
350 350 1650 650 200 750
100 950 1350 500 600 900
1700 550 1550 850 750 50
1200 300 850 200 1600 750
1600 3700 10 400 350 2000
335 1750 1550 2100 700 100
480 650 550 650 1600 800
220 1100 950 1600 650 50
1640 2700 500 450 400 1450
1185 1550 1900 2150 750 50
100 300 1600 750 250 700
150 1150 2350 450 550 850
;
Obs district race gender learntype level absenses 1 OKEECHOBEE B M Passive Elementary 100 2 OKEECHOBEE B M Passive Middle 300 3 OKEECHOBEE B M Passive High 1600 4 OKEECHOBEE B M Active Elementary 750 5 OKEECHOBEE B M Active Middle 250 6 OKEECHOBEE B M Active High 700 7 OKEECHOBEE B F Passive Elementary 150
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.