BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
samira
Obsidian | Level 7

Thanks a lot for the reply, 

 

I have attached my data source for you. in the code "datasource20012017filtered" is actually same as this dataset.

 

 

 

 

Tom
Super User Tom
Super User

You seem to have made a lot of progress.

Here a couple of notes on ways you could change this recent code to be simpler.

 

Take this block of code for instance.

disease=tranwrd(disease, ",", " ");
disease=tranwrd(disease, "<", " ");
disease=tranwrd(disease, "(", " ");
disease=tranwrd(disease, ")", " ");
disease=tranwrd(disease, ".", " ");
disease=tranwrd(disease, "-", " ");
disease=compress(disease);
disease=substr(disease,1,min(length(disease),16));

You do a lot of work to convert selected characters to spaces and then use COMPRESS() to remove the spaces.  Did you know that you can tell the COMPRESS() function what characters to remove?  So you could reduce all of that to one function.  Take a look at the manual and read about the optional third paramater for modifiers and you could probably reduce it even more.

disease=compress(disease,' ,<().-');

Also SAS stores character strings as fixed length fields that are padded with spaces so there is no need to check the actual length of DISEASE to truncate it.  If you are worried that the field is defined with fewer than 16 characters as so want to prevent the SUBSTR() function from complaining that the third argument is invalid then you should have used the VLENGTH() function.  But what you really should do is use the SUBSTRN() function instead as that will allow the third parameter value to be longer than the source string.

disease=substrn(disease,1,16);

You can use a variable list as a short hand for typing many variable names.  So instead of using 

retain week year2001 year2002 year2003 year2004 year2005 year2006 year2007 year2008 year2009 year2010 year2011 year2012 year2013 year2014 year2015 year2016 year2017;

You can simply type 

retain week year2001 - year2017;

Also in this block of code you are treating the YEARxxxx variables as both character and numeric.

   if year2001=" " then year2001=0;
   if year2002=" " then year2002=0;
   ...

The variables YEAR2001, YEAR2002, etc are numeric variables. If you compare the value to a space then SAS will need to try to convert the space to a number.  You should instead compare the value to a numeric missing value.

  if year2001=. then year2001=0;

If you really didn't know if the variable was numeric or character you could also just use the MISSING() function in the condition.  Also there are other ways you could convert the missing values to zeros.  I will use one of them in the next section.

 

You could also simplify this by using an ARRAY statement.  So instead of writing the same statement 17 times you can write it once and use the ARRAY reference to allow it to be applied to 17 different actual variables.  

array counts year2001-year2017 ;
do i=1 to dim(counts);
  counts(i)= sum(counts(i),0)
end;

 

samira
Obsidian | Level 7
great!! I learned a lot from your comments.
Reeza
Super User
proc sql;
      create table a  as select week, year, count from table ;
      quit;

      data datastep;
      /*do some stuff*/
/*Depends on what goes on here if you need code*/
      run;

      proc reg data=statprocess;
by region disease;
      model avg=x0;
      output out=trig p=yhat r=e stdp=sd;
      run;
samira
Obsidian | Level 7
thank you for your answer.

based on my code I need too make a matrix for each disease and region before doing regression analysis, that is why I cannot use group by in this case
Tom
Super User Tom
Super User

So to convert your logic from needing to run separate steps for each group to just using BY processing you would convert your posted logic to this:

 

* Get ALL the data ;
data all;
  set TABLE ;
  keep region disease week year count ;
run;

* Manipulate the data ;
data statprocess; 
   set all ;
      /*do some stuff*/
run;

* Make sure the data is order by the groups;
proc sort =statprocess ;
  by region disease ;
run;

* Run the analysis;
proc reg data=statprocess;
   by region disease ;
   model avg=x0;
   output out=trig p=yhat r=e stdp=sd;
run;
samira
Obsidian | Level 7
thank you for your answer.

based on my code I need too make a matrix for each disease and region before doing regression analysis, that is why I cannot use group by in this case
Reeza
Super User

@samira wrote:
thank you for your answer.

based on my code I need too make a matrix for each disease and region before doing regression analysis, that is why I cannot use group by in this case

The results are the same, regardless of using BY group or a macro loop.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 7230 views
  • 12 likes
  • 4 in conversation