BookmarkSubscribeRSS Feed
BWDAWG
Obsidian | Level 7

Hey everyone,

 

trying to rearrange some data using an array. I can't figure out what I'm doing wrong here, not getting any errors in my log. Only some of the observations are printing. 

 

options nonumber nodate;
data Project2;
input x1 - x6;

if _n_ in (1,2,3,4,5,6) then district = 'OKEECHOBEE';
IF _N_ IN (7,8,9,10,11,12) THEN DISTRICT= 'SEMINOLE';
IF _N_ IN (13,14,15,16,17,18) THEN DISTRICT= 'VOLUSIA';
ELSE DISTRICT= 'WALTON';

if _n_ in (1,2,7,8,13,14,19,20) then race= 'B';
if _n_ in (3,4,9,10,15,16,21,22) then race= 'W';
else race= 'O';

if _n_ in (1,3,5,7,9,11,13,15,17,19,21,23) then gender= 'M';
else gender= 'F';

array pr2(6) x1-x6;
do i= 1 to 6;
if i < 4 then learntype= 'Passive';
else leanrtype= 'Active';
if i in (1,4) then level= 'Elementary';
if i in (2,5) then level= 'Middle';
else level= 'High';
absenses= pr2(6);
end;
drop i;
datalines;
100 300 1600 750 250 700
150 1150 2350 450 550 850
3350 850 1800 750 600 150
1250 350 700 400 1650 700
1650 700 1150 850 250 1100
200 1600 3700 450 600 750
3800 650 1750 700 550 250
1050 400 650 350 2200 1050
1550 750 1100 900 600 1250
250 1650 2700 450 650 450
2800 1150 1550 650 800 300
600 450 1150 450 2050 1100
350 350 1650 650 200 750
100 950 1350 500 600 900
1700 550 1550 850 750 50
1200 300 850 200 1600 750
1600 3700 10 400 350 2000
335 1750 1550 2100 700 100
480 650 550 650 1600 800
220 1100 950 1600 650 50
1640 2700 500 450 400 1450
1185 1550 1900 2150 750 50
100 300 1600 750 250 700
150 1150 2350 450 550 850

;
proc print data=Project2;
title 'Absenteeism';
VAR district race gender learntype level absenses;
run;

8 REPLIES 8
Kurt_Bremser
Super User

@BWDAWG wrote:

Hey everyone,

 

trying to rearrange some data using an array. I can't figure out what I'm doing wrong here, not getting any errors in my log. Only some of the observations are printing. 

 

options nonumber nodate;
data Project2;
input x1 - x6;

if _n_ in (1,2,3,4,5,6) then district = 'OKEECHOBEE';
IF _N_ IN (7,8,9,10,11,12) THEN DISTRICT= 'SEMINOLE';
IF _N_ IN (13,14,15,16,17,18) THEN DISTRICT= 'VOLUSIA';
ELSE DISTRICT= 'WALTON';

if _n_ in (1,2,7,8,13,14,19,20) then race= 'B';
if _n_ in (3,4,9,10,15,16,21,22) then race= 'W';
else race= 'O';

if _n_ in (1,3,5,7,9,11,13,15,17,19,21,23) then gender= 'M';
else gender= 'F';

array pr2(6) x1-x6;
do i= 1 to 6;
if i < 4 then learntype= 'Passive';
else leanrtype= 'Active';
if i in (1,4) then level= 'Elementary';
if i in (2,5) then level= 'Middle';
else level= 'High';
absenses= pr2(6);
end;
drop i;
datalines;
100 300 1600 750 250 700
150 1150 2350 450 550 850
3350 850 1800 750 600 150
1250 350 700 400 1650 700
1650 700 1150 850 250 1100
200 1600 3700 450 600 750
3800 650 1750 700 550 250
1050 400 650 350 2200 1050
1550 750 1100 900 600 1250
250 1650 2700 450 650 450
2800 1150 1550 650 800 300
600 450 1150 450 2050 1100
350 350 1650 650 200 750
100 950 1350 500 600 900
1700 550 1550 850 750 50
1200 300 850 200 1600 750
1600 3700 10 400 350 2000
335 1750 1550 2100 700 100
480 650 550 650 1600 800
220 1100 950 1600 650 50
1640 2700 500 450 400 1450
1185 1550 1900 2150 750 50
100 300 1600 750 250 700
150 1150 2350 450 550 850

;
proc print data=Project2;
title 'Absenteeism';
VAR district race gender learntype level absenses;
run;


level will always be 'High', as you are missing a crucial else. The same happens with race, which will never be 'B'. And district will only be 'VOLUSIA' or 'WALTON'. Consider using a select() block instead of if/then/else if chains.

Are you getting less than 24 observations in your output?

BWDAWG
Obsidian | Level 7

Thanks for the reply @Kurt_Bremser. So I made a couple of changes by adding else if statements, but I still am only getting 24 observations where I should have 144. problem seems to be the variables level and learntype.

options nonumber nodate;
data example33;
 input x1 - x6;
 if _n_ in (1,2,3,4,5,6) then district = 'OKEECHOBEE';
  else IF _N_ IN (7,8,9,10,11,12) THEN DISTRICT= 'SEMINOLE';
  else IF _N_ IN (13,14,15,16,17,18) THEN DISTRICT= 'VOLUSIA';
  ELSE DISTRICT= 'WALTON';
 
 if _n_ in (1,2,7,8,13,14,19,20) then race= 'B';
 else if _n_ in (3,4,9,10,15,16,21,22) then race= 'W';
 else race= 'O';
 if _n_ in (1,3,5,7,9,11,13,15,17,19,21,23) then gender= 'M';
 else gender= 'F';
 array pr2(6) x1-x6;
 do i= 1 to 6;
  if i < 4 then learntype= 'Passive';
  else learntype= 'Active';
  if i in (1,4) then level= 'Elementary';
  else if i in (2,5) then level= 'Middle';
  else level= 'High';
 absenses= pr2(6);
  end;
 drop I;

 

Kurt_Bremser
Super User

Why would you expect more than 24 observations? With no explicit output statement, the data step will write once at the end of each iteration, and there will only be 24 iterations for the 24 datalines. If you want more in the resulting dataset, you have to add an output statement in the do loop.

FreelanceReinh
Jade | Level 19

Hi @BWDAWG and welcome to the SAS Support Communities!

 

If I look at your code and guess what you intended to do (your description "rearrange some data" is a bit vague), my first recommendations are:

  1. Whenever an ELSE statement is preceded by more than one IF statement, put an "ELSE" in front of the IF statements as well, except for the first "IF" in the group. Otherwise, the instruction following "ELSE" (e.g. DISTRICT='WALTON') is applied to all observations not satisfying the most recent IF statement. That's why neither 'OKEECHOBEE' nor 'SEMINOLE' occurred in your output dataset Project2. (Same for the other IF-IF-...-ELSE-constructs.)
  2. Declare character variables such as DISTRICT with a LENGTH statement to avoid the risk of truncation.
  3. Avoid typos in variable names, such as leanrtype. Otherwise SAS will create unwanted additional variables with exactly these misspelled names.
  4. Rethink the purpose of the DO loop. It seems that you want to assign different values of a variable (e.g. LEARNTYPE) to different array elements. However, in fact all this occurs within the same observation of the output dataset, where a variable (such as LEVEL) can only have one value. Maybe you want to include an OUTPUT statement in the DO loop so as to create a new observation in each of the six iterations. In this case one would expect to see an array reference like pr2[i] somewhere in the DO loop. Otherwise there is no connection between variable i and the array.
BWDAWG
Obsidian | Level 7

@FreelanceReinhthanks for the reply, that was a great explanation, very helpful. I fixed everything you mentioned except the length statement bc I am not seeing any truncation. However something still isn't working correctly. Now it is printing out all 144 observations, but the absenses are not correct, it seems to be assigning the last value in the datalines to all the variables? not sure if i explained that correctly. here is what i updated

 

options nonumber nodate;
data example33;
 input x1 - x6;
 if _n_ in (1,2,3,4,5,6) then district = 'OKEECHOBEE';
  else IF _N_ IN (7,8,9,10,11,12) THEN DISTRICT= 'SEMINOLE';
  else IF _N_ IN (13,14,15,16,17,18) THEN DISTRICT= 'VOLUSIA';
  ELSE DISTRICT= 'WALTON';
 
 if _n_ in (1,2,7,8,13,14,19,20) then race= 'B';
 else if _n_ in (3,4,9,10,15,16,21,22) then race= 'W';
 else race= 'O';
 if _n_ in (1,3,5,7,9,11,13,15,17,19,21,23) then gender= 'M';
 else gender= 'F';
 array pr2(6) x1-x6;
 do i= 1 to 6;
  if i < 4 then learntype= 'Passive';
  else learntype= 'Active';
  if i in (1,4) then level= 'Elementary';
  else if i in (2,5) then level= 'Middle';
  else level= 'High';
 absenses= pr2(6);
 output;
  end;
 drop I;
FreelanceReinh
Jade | Level 19

@BWDAWG wrote:

I fixed everything you mentioned except the length statement bc I am not seeing any truncation.

 

You were just lucky that no truncation occurred. Truncation would have occurred if the first value assigned to, e.g., DISTRICT, LEARNTYPE and LEVEL had not happened to be the longest value. Imagine that someone told you in a few weeks to alter the value of variable LEVEL where it was 'Middle' to 'Intermediate'. Would you then really remember that without additional changes to the code (such as a LENGTH statement) the new values would be truncated to 'Intermedia'?

FreelanceReinh
Jade | Level 19

If you want the first, second, ..., sixth value, i.e., the values of x1, ..., x6, to occur in the new variable ABSENSES, pr2[i] (in place of pr2(6)) would achieve this.

Tom
Super User Tom
Super User

Looks like the first three variables generate 4*3*2=24 combinations, one per row in the file.

And the last two variables generate 2*3=6 combinations, one per each value in a row.

 

Why not just let the DO loop take care of the values.  

Looks like the values are in this order.  But perhaps you will need to reorder the DO statements to match the pattern used to create the source data file.

data Project2;
  length district $20 race $1 gender $1 learntype $10 level $10;
  do district='OKEECHOBEE','SEMINOLE','VOLUSIA','WALTON';
  do race = 'B','W','O';
  do gender='M','F' ;
  do learntype= 'Passive','Active';
  do level= 'Elementary','Middle','High';
    input absenses @ ;
    output;
  end;
  end;
  end;
  end;
  end;
datalines;
100 300 1600 750 250 700
150 1150 2350 450 550 850
3350 850 1800 750 600 150
1250 350 700 400 1650 700
1650 700 1150 850 250 1100
200 1600 3700 450 600 750
3800 650 1750 700 550 250
1050 400 650 350 2200 1050
1550 750 1100 900 600 1250
250 1650 2700 450 650 450
2800 1150 1550 650 800 300
600 450 1150 450 2050 1100
350 350 1650 650 200 750
100 950 1350 500 600 900
1700 550 1550 850 750 50
1200 300 850 200 1600 750
1600 3700 10 400 350 2000
335 1750 1550 2100 700 100
480 650 550 650 1600 800
220 1100 950 1600 650 50
1640 2700 500 450 400 1450
1185 1550 1900 2150 750 50
100 300 1600 750 250 700
150 1150 2350 450 550 850
;
Obs     district     race    gender    learntype    level         absenses

  1    OKEECHOBEE     B        M        Passive     Elementary       100
  2    OKEECHOBEE     B        M        Passive     Middle           300
  3    OKEECHOBEE     B        M        Passive     High            1600
  4    OKEECHOBEE     B        M        Active      Elementary       750
  5    OKEECHOBEE     B        M        Active      Middle           250
  6    OKEECHOBEE     B        M        Active      High             700
  7    OKEECHOBEE     B        F        Passive     Elementary       150

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2031 views
  • 0 likes
  • 4 in conversation