BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cdubs
Quartz | Level 8

I have the dataset 

 

ID ... year .... 2010_pct .... 2011_pct ... 2012_pct  

1 .... 2010 .... 24 ...................25 ............... 27 

1 .... 2011 .... 24 ...................25 ................27 

1......2012 .... 24 ...................25 ...............27  

2 .... 2010 ...... . .....................1 ...............  . 

2 .... 2011 ...... . .....................1 ...............  . 

2 .... 2012 ...... . .....................1 ...............  .

3 .... 2010 ..... 97 .................54 ................  79 

3 .... 2011 ..... 97 .................54 ................. 79 

3 .... 2012 ..... 97 .................54 ................  79 

 

I want to create a new variable "volume" where if the percentile for that ID for that year is < 25, then volume = 1, 25-75 then volume = 2, >75 then volume = 3, and all others (e.g., if blank), then volume = 4. 

 

So, for example, ID 1 for the 2010 entry would be volume = 1 (because pct_2010 <25), for 2011 entry would be volume = 2 (because pct_2011 is 25-75), for the 2012 entry would be volume =2 (because pct_2012 is 25-75). 

 

I am trying something like the following, but obviously it seems like I can't just invoke "i" like below. Any tips on how I can do this kind of code? 

 

data have; 
	set want;
	volume = .;
	do i=2010 to 2012;
	if (year = i and pct_&i. <=25 then volume = 1);
	if (year = i and pct_&i. >25 and pct_&i. <=75 then volume =2);
	if (year = i and pct_&i. >75 then volume = 3); 
	if (year = i and pct_&i. = . then volume = 4);
	output;
end;
run;

 

Thank you so much!!! 

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

You could use an array, but you don't need to:

data want;
   set have;

   length volume _value 8;
   drop _value;

   _value = input(vvaluex(cats('pct_', year)), best.);

   select;
     when (_value  =  .) volume = 4;
     when (_value <  25) volume = 1;
     when (_value <= 75) volume = 2;
     otherwise volume = 3;
   end;
run;

The functions cats create the name of the variable to be inspected, vvaluex returns the (formatted) value of that variable as text, so we need input to convert it back to a number.

 

The following version uses an array, most likely more efficient, than calling three functions 😉

data want2;
   set have;

   length volume 8;
   array pct[2010:2012] pct_:; /* years need to be changed */

   select;
     when (pct[year]  =  .) volume = 4;
     when (pct[year] <  25) volume = 1;
     when (pct[year] <= 75) volume = 2;
     otherwise volume = 3;
   end;
run;

 

View solution in original post

11 REPLIES 11
andreas_lds
Jade | Level 19

You need to use an array to access the pct-variables. Also note, that "2010_pct" is not a valid name for a variable. I think it would be better to start by transposing the data you have and getting rid of data (year) in variable names at all, making the code independent of the data.

 

If you want code, please provide test-data in usable form - a data-step using datalines.

KachiM
Rhodochrosite | Level 12

@cdubs 

 

As pointed out by @andreas_lds , arrays will be convenient. Further use variable names that SAS accepts.

 

data have;
input ID year pct_2010 pct_2011 pct_2012;  
datalines;
1 2010 24 25 27 
1 2011 24 25 27 
1 2012 24 25 27  
2 2010  .  1  .   
2 2011  .  1  .
2 2012  .  1  .
3 2010 97 54 79 
3 2011 97 54 79 
3 2012 97 54 79
;
run;

The program using arrays:

 

data want;
   set have;
   array p pct_2010 - pct_2012;
   array vol volume1 - volume3;
   do i = 1 to dim(p);
      select;
      when (p[i]<25) vol[i] = 1;
      when (25 <= p[i] <= 75) vol[i] = 2;
      when (p[i] > 75) vol[i] = 3;
      otherwise vol[i] = 4;
      end;
   end;
drop i;
run;


Edit:
when (p[i]<25) vol[i] = 1;
The above statement includes missing value also. Change this as:
    
     when (0 <= p[i] < 25) vol[i] = 1;

No other change is required.Ignore the previous output. The revised output is:


Obs ID year pct_2010 pct_2011 pct_2012 volume1 volume2 volume3
1 1 2010 24 25 27 1 2 2
2 1 2011 24 25 27 1 2 2
3 1 2012 24 25 27 1 2 2
4 2 2010 . 1 . 4 1 4
5 2 2011 . 1 . 4 1 4
6 2 2012 . 1 . 4 1 4
7 3 2010 97 54 79 3 2 3
8 3 2011 97 54 79 3 2 3
9 3 2012 97 54 79 3 2 3

Capture_01.JPG

yabwon
Onyx | Level 15

Hi @KachiM,

 

I was wondering, since the OP requirement was:

I want to create a new variable "volume" where if the percentile for that ID for that year is < 25, then volume = 1, 25-75 then volume = 2, >75 then volume = 3, and all others (e.g., if blank), then volume = 4. 

shouldn't the code be:

data want;
   set have;
   array p pct_2010 - pct_2012;
   array vol volume1 - volume3;
   do i = 1 to dim(p);
      select;
        when (p[i]  =  .) vol[i] = 4;
        when (p[i] <  25) vol[i] = 1;
        when (p[i] <= 75) vol[i] = 2;
        otherwise         vol[i] = 3;
      end;
   end;
drop i;
run;

to have 4 for missing?

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



KachiM
Rhodochrosite | Level 12

@yabwon 

 

You are correct. The use of the statement,

 

when(p[i] < 25) vol[i] = 1;

inludes the missing value also. I missed to read and think on the output. I learned the lesson.

andreas_lds
Jade | Level 19

@KachiM thanks for providing data 😉

 

The solution provided by @KachiM is all you need, if you are sure that you don't have variables like pct_2012, pct_2013 and pct_2014 next time you need this analysis.

 

proc transpose data=have out=transposed(rename=(col1=pct)) name=original;
   by id year;
   var pct:;
run;

data want;
   set transposed;
   where compress(original, '', 'dk') = put(year, 4.);

   length volume 8;

   select;
      when (pct <= 25) volume = 1;
      when (25 < pct <= 75) volume = 2;
      when (pct > 75) volume = 3;
      otherwise volume = 4;
   end;

   /*drop original;*/
run;
cdubs
Quartz | Level 8

Thank you all so much!!!! Thank each and everyone of you for being so patient and thoughtful and helping me through SAS T.T

 

I oversimplified my data so this is completely my fault... but unfortunately, my data is more like the following.  😞 (I may have oversimplified it... heh). Would arrays still work this way? (I do actually have pct_2010 through pct_2017 but I'm hoping below will get the point across).  

 

data have;
input ID year pct_2010 pct_2011 pct_2012 var;  
datalines;
1 2010 24 25 27 5 
1 2011 24 25 27 6 
1 2012 24 25 27 5  
1 2012 24 25 27 10
1 2012 24 25 27 55
2 2010  .  1  . 57   
2 2010  .  1  . 99
2 2011  .  1  . 10
2 2012  .  1  . 0
3 2010 97 54 79 5
3 2011 97 54 79 7
3 2012 97 54 79 8
;
run;

And what I would like is below, just one volume column that knows to "read" pct_2010, if you will, if the year is 2010. and only reads pct_2011 if the year is 2011. Sorry that I didn't provide this additional information in the OP!

 

data want;
input ID year pct_2010 pct_2011 pct_2012 var volume;  
datalines;
1 2010 24 25 27 5 1 
1 2011 24 25 27 6 2 
1 2012 24 25 27 5 2 
1 2012 24 25 27 10 2
1 2012 24 25 27 55 2
2 2010  .  1  . 57 4  
2 2010  .  1  . 99 4
2 2011  .  1  . 10 1
2 2012  .  1  . 0  4
3 2010 97 54 79 3 
3 2011 97 54 79 3 
3 2012 97 54 79 3
;
run;

THANK YOU GUYS!

andreas_lds
Jade | Level 19

You could use an array, but you don't need to:

data want;
   set have;

   length volume _value 8;
   drop _value;

   _value = input(vvaluex(cats('pct_', year)), best.);

   select;
     when (_value  =  .) volume = 4;
     when (_value <  25) volume = 1;
     when (_value <= 75) volume = 2;
     otherwise volume = 3;
   end;
run;

The functions cats create the name of the variable to be inspected, vvaluex returns the (formatted) value of that variable as text, so we need input to convert it back to a number.

 

The following version uses an array, most likely more efficient, than calling three functions 😉

data want2;
   set have;

   length volume 8;
   array pct[2010:2012] pct_:; /* years need to be changed */

   select;
     when (pct[year]  =  .) volume = 4;
     when (pct[year] <  25) volume = 1;
     when (pct[year] <= 75) volume = 2;
     otherwise volume = 3;
   end;
run;

 

cdubs
Quartz | Level 8

Thank you so much!! This works so well!! I ended up using the second set of code.

KachiM
Rhodochrosite | Level 12

@cdubs 

 

The array can handle 2010 to 2017 or even more. What do you want with VAR?

input ID year pct_2010 pct_2011 pct_2012 var; 

What is VOLUME Here?

input ID year pct_2010 pct_2011 pct_2012 var volume;

 Do you want VAR to be used to get VOLUME? If that so, it is not clear how to use VAR to get Volume. Can you explain by an example?

cdubs
Quartz | Level 8

Thank you so much for your patience!! 

 

Volume is the variable I'm trying to generate. If a data entry has the year "2010", then the code I'm looking for in OP would ideally use the variable "pct_2010" to generate a 1, 2, 3 or 4 based on <25, 25-75, >75, or missing. 

 

As for var, I'll probably end up averaging it, or doing other basic descriptive statistics. Var would be things like age, length of stays, etc, things that I want to calculate given an ID's volume. 

 

My overall research question is to find out whether "var" (such as age, length of stays) vary by not only ID, but by volume, as defined by the ID's percentile that year. I had previously calculated each ID's "volume" (e.g., volume of hospital visits) by year and attached that number to each ID, e.g. pct_2010 means the percentile of volume (of hospital visits) that that ID had relative to all other IDs that year, 2010

 

Thank you so much for helping me get to exactly where I need to be!! >< 

KachiM
Rhodochrosite | Level 12

@cdubs 

 

The present thread can be considered as done. Can you open a new thread with your generalized question with an example Data set and the required output for one VAR?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1350 views
  • 8 likes
  • 4 in conversation