BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
K_Wils15
Obsidian | Level 7

I have been working on this homework assignment all day and still have three questions I just cannot solve:

1)  There is a data file called VIRUS_PROLIF. Three variables: SAMPLE_NUMBER CELL_TYPE, and COUNT.  COUNT is a value for the initial count of the number of virions in a cell.  If we assume that the virions replicate at a rate such that the total number of virions will increase by 10% per minute, write the SAS code using DO UNTIL to calculate how long one would predict it would take (in minutes) for the number of virions to exceed 100,000 for each cell.

Then, write the SAS code using a data step and accumulating variables to calculate the minimum, maximum, and average time to exceed 100,000 virions for the cells of each CELL_TYPE.

Here is what I have for that one:

 

data VIRUS_PROLIF2;

set VIRUS_PROLIF;

   do until(COUNT>100000);

      Count+(Count*.01);

   end;

run;

proc sort data=VIRUS_PROLIF2 out=VIRUS_PROLIF3;

by SAMPLE_NUMBER CELL_TYPE COUNT;

run;

 

data CELL_TYPE_DATA (keep = SAMPLE_NUMBER CELL_TYPE COUNT n_cell_type min_cell_type max_Cell_type

avg_cell_type sum_Cell_type);

      set Virus_Prolif3;

      by SAMPLE_NUMBER CELL_TYPE COUNT;

      retain min_cell_type max_Cell_type sum_Cell_Type n_Cell_type;

      if first.Cell_Type then do;

            min_Cell_type = Cell_Type;

            max_Cell_Type = Cell_Type; 

            sum_Cell_Type = 0;

            n_Cell_TYPE = 0;

            end;

      sum_Cell_TYPE = sum(sum_Cell_TYPE, Cell_TYPE);

      n_CELL_TYPE = sum(n_CELL_TYPE, 1);

      min_CELL_TYPE = min(min_CELL_TYPE, CELL_TYPE);

      max_CELL_TYPE = max(max_Cell_TYPE, CELL_TYPE);

      avg_CELL_TYPE = sum_CELL_TYPE / n_CELL_TYPE;         

      if last.CELL_TYPE;

run;

 

2. There is  data set 'HW3_Items'  contains the students' answers to a 25-question test (variables = answer_01 to answer_25).  On each student's observation, we also have variables for the correct answers (variables = correct_01 to correct_25).  Write a data step in which you will use arrays and a DO loop to determine each student's score on the test (1 point for each correct answer).

Here is what I have for that:

 data HW3_ITEMS2;

Set HW3_ITEMS;

      array Answer {25} answer_01 - answer_25;

      array Correct_Answer {25} correct_01 - correct_25;

      array Answer_Correct {25} Answer_correct_01 - Answer_correct_25;

 

 do x = 1 to 25;

 if Answer[X] - Correct_Answer[X] = 0 then Answer_Correct[X] = 1 + x ;

 else Answer_Correct[X] = 0;

      end;

 

run;

 

proc print data=HW3_ITEMS2;

run;

 

3. Last one is looking at overweight and normal weight individuals using data set sashelp.heart and I write the SAS code  to test whether the mean cholesterol level for normal weight subjects differed significantly from that of overweight subjects. I know that it is a ttest. But every time I try to do proc ttest

data=sashelp.heart;
class Weight_Status;
var Chol_STATUS;
run;

There is an error and I am not sure why or how to fix it. 

 

Please advise! Any advice would be greatly appreciated. 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@K_Wils15 

You will get more answers faster if you provide sample data, ask for one thing only per question and provide a desired output.

Here the code for your first homework question.

data VIRUS_PROLIF;
  input SAMPLE_NUMBER CELL_TYPE $ COUNT;
  datalines;
1 xy 1000
2 xy 80000
3 xy 100000
4 xy 99999
;

data VIRUS_PROLIF2;
  set VIRUS_PROLIF;
  growth_time=0;
  if count<100000 then
    do;
      do until(COUNT>=100000);
        Count+(Count*.01);
        growth_time+1;
      end;
    end;
run;

/**
Then, write the SAS code using a data step and accumulating 
variables to calculate the minimum, maximum, and average time 
to exceed 100,000 virions for the cells of each CELL_TYPE.
**/
proc sort data=VIRUS_PROLIF2;
  by cell_type;
run;

data VIRUS_PROLIF3(keep=cell_type min_: max_: avg_:);
  set VIRUS_PROLIF2;
  by cell_type;
  retain min_growth_time max_growth_time;
  min_growth_time=min(growth_time, min_growth_time);
  max_growth_time=max(growth_time, max_growth_time);
  _n+1;
  _sum_growthtime+growth_time;
  if last.cell_type then
    do;
      avg_growth_time=_sum_growthtime/_n;
      output;
      call missing(of _:);
    end;
run;

proc print;
run;

 

A Do Until clause always gets executed at least once so even if the cell count is already 100T+ you still would end-up with one iteration and though adding a minute. For this reason I had to add a test first (if count<100000 then...)

 

If it was me then I'd go for a DO While and would express the time it takes in seconds so that I can create a SAS Time value (which is the count in seconds). The code could then look like:

data VIRUS_PROLIF2;
  set VIRUS_PROLIF;
  format growth_time time10.;
  do growth_time=0 by 60 while(COUNT<100000);
    Count+(Count*.01);
  end;
run; 

....and hopefully one purpose of this exercise is to demonstrate that sometimes a little bit of math is very useful to speed-up processing and save computer resources. 

View solution in original post

1 REPLY 1
Patrick
Opal | Level 21

@K_Wils15 

You will get more answers faster if you provide sample data, ask for one thing only per question and provide a desired output.

Here the code for your first homework question.

data VIRUS_PROLIF;
  input SAMPLE_NUMBER CELL_TYPE $ COUNT;
  datalines;
1 xy 1000
2 xy 80000
3 xy 100000
4 xy 99999
;

data VIRUS_PROLIF2;
  set VIRUS_PROLIF;
  growth_time=0;
  if count<100000 then
    do;
      do until(COUNT>=100000);
        Count+(Count*.01);
        growth_time+1;
      end;
    end;
run;

/**
Then, write the SAS code using a data step and accumulating 
variables to calculate the minimum, maximum, and average time 
to exceed 100,000 virions for the cells of each CELL_TYPE.
**/
proc sort data=VIRUS_PROLIF2;
  by cell_type;
run;

data VIRUS_PROLIF3(keep=cell_type min_: max_: avg_:);
  set VIRUS_PROLIF2;
  by cell_type;
  retain min_growth_time max_growth_time;
  min_growth_time=min(growth_time, min_growth_time);
  max_growth_time=max(growth_time, max_growth_time);
  _n+1;
  _sum_growthtime+growth_time;
  if last.cell_type then
    do;
      avg_growth_time=_sum_growthtime/_n;
      output;
      call missing(of _:);
    end;
run;

proc print;
run;

 

A Do Until clause always gets executed at least once so even if the cell count is already 100T+ you still would end-up with one iteration and though adding a minute. For this reason I had to add a test first (if count<100000 then...)

 

If it was me then I'd go for a DO While and would express the time it takes in seconds so that I can create a SAS Time value (which is the count in seconds). The code could then look like:

data VIRUS_PROLIF2;
  set VIRUS_PROLIF;
  format growth_time time10.;
  do growth_time=0 by 60 while(COUNT<100000);
    Count+(Count*.01);
  end;
run; 

....and hopefully one purpose of this exercise is to demonstrate that sometimes a little bit of math is very useful to speed-up processing and save computer resources. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 512 views
  • 0 likes
  • 2 in conversation