BookmarkSubscribeRSS Feed
pmpradhan
Quartz | Level 8

What would be an equivalent SAS code for this STATA code:

 

xtile patient_xtile = totalpatients if  surveymiss !=1, nquantiles(4)

ta patient_xtile
bysort patient_xtile: sum totalpatients

 

Thanking you in advance.

10 REPLIES 10
pmpradhan
Quartz | Level 8

I'm getting slighlty different result with following code. Is the code below equivalent: 

 

proc sort data=dataname;
by totalpatients;
run;

 

data test;

set dataname;

n_group=floor(_n_/num/4));

if n_group=4 then n_group=3;

run;

 

proc means data=test;
class n_group;
var totalpatients;
run;

 

Please confirm!

Astounding
PROC Star

Your calculations for N_GROUP look like they compute uneven group sizes:

 

n_group=floor(_n_/num/4));

if n_group=4 then n_group=3;

 

Since I'm not a STATA user, I can't really tell the intent of the code.  But if you want 4 equal size groups, you could use:

 

data test;

set dataname nobs=_total_obs_;

n_group=ceil (4 * _n_ / _total_obs_);

run;

 

If you were looking for 5 equal size groups, just change "4" to "5".

Reeza
Super User

quantiles makes me think it's quartiles and that PROC RANK with GROUP=4 should be used.

EDIT: was going to link to a similar question from last week, but it was your question. I'm assuming they're related?

https://communities.sas.com/t5/Base-SAS-Programming/Quartiling-and-finding-the-average-in-each-quart...

 

The answer is the same.

 

proc rank data=sashelp.cars out=ranked groups=4;
var mpg_city;
rank rank_mpg_city;
run;

proc means data=ranked noprint nway;
class rank_mpg_city;
var mpg_city;
output out=want mean(mpg_city)=avg_mpg_city_Quartiled;
run;

proc print data=ranked;
run;
pmpradhan
Quartz | Level 8

Yes, the post you referenced was from me. This time I wanted to reproduce the results that another person produced in STATA. I got close enough result but not same. I was off by few numbers. I will try reorganizing the data. Thanks again!

Reeza
Super User

Quantile/Percentile calculations likely differ, especially if you have ties or not a lot of data. In that case, see the defintions for how PROC RANK calculates percentiles against STATA and see which definition you should be using. 

 

There is no 'standard' method to calculate the percentiles - Excel will do it differently as well.

 


@pmpradhan wrote:

Yes, the post you referenced was from me. This time I wanted to reproduce the results that another person produced in STATA. I got close enough result but not same. I was off by few numbers. I will try reorganizing the data. Thanks again!


 

pmpradhan
Quartz | Level 8

Thank you Astounding. Since the no of observation in the dataset is not even-I had to do so. But I like your use of ceil too. Thank you!

Reeza
Super User

Explain the logic and we can answer your question faster. Otherwise you need to wait for someone who understands Stata code.

pmpradhan
Quartz | Level 8

The logic in this particular case is to translate the stata code so that I can have same results. As I replied above, I'm few numbers far from getting an exact match. I will try with sorting the data again and update this thread. I appreciate the community support-here, Thanks team! 

ballardw
Super User

@pmpradhan wrote:

The logic in this particular case is to translate the stata code so that I can have same results. As I replied above, I'm few numbers far from getting an exact match. I will try with sorting the data again and update this thread. I appreciate the community support-here, Thanks team! 


Quite often with non-trivial cases the "same results" may not be possible due to things like internal rounding of values, precision of the hardware/software used or just plain differences in algorithms for approximations.

 

If you search this forum you will find a few questions about people getting different results between SAS version X.X and Y.Y where algorithms are tweaked between versions, or moving from one OS to another, especially the 32 bit vs. 64 bit versions of the same OS.

 

You may have to set a target for "close enough".

art297
Opal | Level 21

I'm not familiar with STATA, but I would try a couple of changes to @Reeza's suggested code. If I'm reading your STATA code correctly, it appears that you're excluding cases where surveymiss is equal to 1. Also, it appears that the xtile statement produces ranks from 1 to 4, while SAS produces 0 to 3. As such, I'd try something like:

data cars;
  set sashelp.cars;
  if _n_ in (5,7,15,40) then surveymiss=1;
  else surveymiss=2;
run;

proc rank data=cars (where=(surveymiss ne 1)) out=ranked groups=4;
  var mpg_city;
  ranks rank_mpg_city;
run;

data ranked;
  set ranked;
  rank_mpg_city=rank_mpg_city+1;
run;

proc means data=ranked noprint nway;
  class rank_mpg_city;
  var mpg_city;
  output out=want mean(mpg_city)=avg_mpg_city_Quartiled;
run;

proc print data=ranked;
run;

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1299 views
  • 3 likes
  • 5 in conversation