BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
samira
Obsidian | Level 7

Thanks a lot for the reply, 

 

I have attached my data source for you. in the code "datasource20012017filtered" is actually same as this dataset.

 

 

 

 

Kurt_Bremser
Super User

THAT is not a data step with datalines. For many of us, an Excel file is completely useless, as we can't download it for security reasons.

Please post your example data as requested.

Tom
Super User Tom
Super User

You seem to have made a lot of progress.

Here a couple of notes on ways you could change this recent code to be simpler.

 

Take this block of code for instance.

disease=tranwrd(disease, ",", " ");
disease=tranwrd(disease, "<", " ");
disease=tranwrd(disease, "(", " ");
disease=tranwrd(disease, ")", " ");
disease=tranwrd(disease, ".", " ");
disease=tranwrd(disease, "-", " ");
disease=compress(disease);
disease=substr(disease,1,min(length(disease),16));

You do a lot of work to convert selected characters to spaces and then use COMPRESS() to remove the spaces.  Did you know that you can tell the COMPRESS() function what characters to remove?  So you could reduce all of that to one function.  Take a look at the manual and read about the optional third paramater for modifiers and you could probably reduce it even more.

disease=compress(disease,' ,<().-');

Also SAS stores character strings as fixed length fields that are padded with spaces so there is no need to check the actual length of DISEASE to truncate it.  If you are worried that the field is defined with fewer than 16 characters as so want to prevent the SUBSTR() function from complaining that the third argument is invalid then you should have used the VLENGTH() function.  But what you really should do is use the SUBSTRN() function instead as that will allow the third parameter value to be longer than the source string.

disease=substrn(disease,1,16);

You can use a variable list as a short hand for typing many variable names.  So instead of using 

retain week year2001 year2002 year2003 year2004 year2005 year2006 year2007 year2008 year2009 year2010 year2011 year2012 year2013 year2014 year2015 year2016 year2017;

You can simply type 

retain week year2001 - year2017;

Also in this block of code you are treating the YEARxxxx variables as both character and numeric.

   if year2001=" " then year2001=0;
   if year2002=" " then year2002=0;
   ...

The variables YEAR2001, YEAR2002, etc are numeric variables. If you compare the value to a space then SAS will need to try to convert the space to a number.  You should instead compare the value to a numeric missing value.

  if year2001=. then year2001=0;

If you really didn't know if the variable was numeric or character you could also just use the MISSING() function in the condition.  Also there are other ways you could convert the missing values to zeros.  I will use one of them in the next section.

 

You could also simplify this by using an ARRAY statement.  So instead of writing the same statement 17 times you can write it once and use the ARRAY reference to allow it to be applied to 17 different actual variables.  

array counts year2001-year2017 ;
do i=1 to dim(counts);
  counts(i)= sum(counts(i),0)
end;

 

samira
Obsidian | Level 7
great!! I learned a lot from your comments.
Reeza
Super User
proc sql;
      create table a  as select week, year, count from table ;
      quit;

      data datastep;
      /*do some stuff*/
/*Depends on what goes on here if you need code*/
      run;

      proc reg data=statprocess;
by region disease;
      model avg=x0;
      output out=trig p=yhat r=e stdp=sd;
      run;
samira
Obsidian | Level 7
thank you for your answer.

based on my code I need too make a matrix for each disease and region before doing regression analysis, that is why I cannot use group by in this case
Tom
Super User Tom
Super User

So to convert your logic from needing to run separate steps for each group to just using BY processing you would convert your posted logic to this:

 

* Get ALL the data ;
data all;
  set TABLE ;
  keep region disease week year count ;
run;

* Manipulate the data ;
data statprocess; 
   set all ;
      /*do some stuff*/
run;

* Make sure the data is order by the groups;
proc sort =statprocess ;
  by region disease ;
run;

* Run the analysis;
proc reg data=statprocess;
   by region disease ;
   model avg=x0;
   output out=trig p=yhat r=e stdp=sd;
run;
samira
Obsidian | Level 7
thank you for your answer.

based on my code I need too make a matrix for each disease and region before doing regression analysis, that is why I cannot use group by in this case
Reeza
Super User

@samira wrote:
thank you for your answer.

based on my code I need too make a matrix for each disease and region before doing regression analysis, that is why I cannot use group by in this case

The results are the same, regardless of using BY group or a macro loop.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 24 replies
  • 8206 views
  • 12 likes
  • 4 in conversation