BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mayasak
Quartz | Level 8

Hi I have a dataset with 3 variables that have the following freqs for each of the variables by year

 

var1*year    

2000   2001   2002    2003  2004  2005  2006  2007  2008  2009  2010  2011  2012 

 

22       34       43        21      0        0        0        1         0       1        1         0       0

33       44       17       13       1        1        0        0         0       0        0        1        0  

 

var2*year

2000   2001   2002    2003  2004  2005  2006  2007 2008  2009  2010  2011 2012 

 

0         1         1          0         35      24      42      22     0        0        0        0       2

1         0         0          0         54      13      24      16     2        1        0        0       0                    

 

var3*year

2000   2001   2002    2003  2004  2005  2006  2007 2008 2009 2010 2011 2012 

0         0         0          0        1        0        1        1       33     54      23    14     78

1         2         0          0        0        0        0        0       65     44     123    34    89

 

So I need to join these 3 variables into one variable using only years 2000-2003 for var1, 2004 to 2007 for var2 and 2008-2012 for var3.

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Thanks for clarifying that this is about cancer staging schemes and for the attached cross(?)-tabulations.

 

I find it confusing that both the tables in your initial post and the new ones have titles suggesting cross-tabulations (var1*year etc.), but have no row headers. This makes it difficult to compare different tables appropriately, because it is not clear which categories (rows) in different tables should be compared.

 

Each of the tables in the attachment has a total row at the bottom, which can be guessed only from the numbers.

 

Also, you use variable name VarComb (which I used for the CATX-concatenated variable) for a variable derived using the COALESCE function (an approach which I advised against).

 

The attached tables present a different picture than those in your initial post: Table "var3*year" in your initial post suggested that for the years 2004-2006 VAR3 has only very few non-missing values. The attached table with the same title, however, shows large numbers in all (displayed) categories. At the same time, table "Var2*year" has substantial numbers in only two of the eight categories shown. Of course, this must have to do with the significant change in the assignment of years to variables which has occurred between your initial and current post.

 

I tend to believe that, at the current stage, this is not a programming question. You should rather talk to a medical expert who is familiar with those different cancer staging schemes and who can advise you if and how different schemes can be unified for further analysis. As soon as you can describe the rule clearly, many forum members will be able to help you implementing it in SAS.

 

 

View solution in original post

7 REPLIES 7
Reeza
Super User

Your question is unclear. 

Post what you have, as identical as possible, and what you want. Ideally, what you've tried as well and what isn't working. 

 

ballardw
Super User

Do you have your data in SAS? Since you can't have a variable named "2000" then that structure you demonstrate woule have to have many more variables with different names than implied.

 

And how are you "joining" variables? Concatenate, add, multiply or something else entirely?

FreelanceReinh
Jade | Level 19

Hi @mayasak,

 

Maybe this can be a start:

data want;
set have;
     if year in (2000:2003) then var_new=var1;
else if year in (2004:2007) then var_new=var2;
else if year in (2008:2012) then var_new=var3;
run;

Then you can compare your PROC FREQ output to that of the following step:

proc freq data=want;
tables var_new*year;
run;

 

mayasak
Quartz | Level 8

Thank you

FreelanceReinh
Jade | Level 19

Without considering the years, you could use the COALESCE function to obtain an aggregated ("merged") variable (by taking the first non-missing value, if any, from VAR1, VAR2, VAR3 in each observation):

data want2a;
set have;
var_new=coalesce(of var1-var3);
run;

proc compare data=want2a c=want;
run;

All differences shown in the PROC COMPARE output would be due to "overlaps" of "different variables for the same years." However, there can be overlaps without affecting the PROC COMPARE output, for example if VAR1 and VAR2 had (possibly different) non-missing values in an observation of the year 2000, VAR_NEW in both WANT and WANT2A would have the value from VAR1. Therefore, I wouldn't recommend this approach in general.

 

I would rather look at the combinations of VAR1 - VAR3:

data want2b;
length varcomb $30; /* please adapt the length as appropriate */
set have;
varcomb=catx(of var1-var3);
run;

proc freq data=want2b order=freq;
tables varcomb;
run;

Now, overlaps can be clearly recognized in the PROC FREQ output. For instance, a value VARCOMB='1,.,0' would mean "VAR1=1, VAR2=. (missing), VAR3=0." Typically, the corresponding frequency of such combinations will be low. Due to the ORDER= option these exceptional cases will appear closer to the bottom of the frequency table, whereas the predominant cases will be found in the first few rows of the table.

mayasak
Quartz | Level 8

I'm not sure if I'm doing the right thing. Let me explain what I'm supposed to do.

The data is for cancer and the three variables are for cancer staging. The problem is that the staging schemes for cancer changed throughout the years. So the first variable is for staging used from year 1988 to 2000. The second variable from year 2001-2003 and the third for years 2004 to 2003. However, I found that there were overlap in years using different staging schemes. What I have to do is to compare between the total cases per each year vs the number of records returned with a populated field value for each respective year/ staging scheme. 

 

I've attached Freq tests for the var2*year and var3*year

Also  

CombVar*year (using “if year in” codes that you provided in the beginning)

And

VarComb*year   (using the coalesce statement)

 

Thanks

             

 

         

 

FreelanceReinh
Jade | Level 19

Thanks for clarifying that this is about cancer staging schemes and for the attached cross(?)-tabulations.

 

I find it confusing that both the tables in your initial post and the new ones have titles suggesting cross-tabulations (var1*year etc.), but have no row headers. This makes it difficult to compare different tables appropriately, because it is not clear which categories (rows) in different tables should be compared.

 

Each of the tables in the attachment has a total row at the bottom, which can be guessed only from the numbers.

 

Also, you use variable name VarComb (which I used for the CATX-concatenated variable) for a variable derived using the COALESCE function (an approach which I advised against).

 

The attached tables present a different picture than those in your initial post: Table "var3*year" in your initial post suggested that for the years 2004-2006 VAR3 has only very few non-missing values. The attached table with the same title, however, shows large numbers in all (displayed) categories. At the same time, table "Var2*year" has substantial numbers in only two of the eight categories shown. Of course, this must have to do with the significant change in the assignment of years to variables which has occurred between your initial and current post.

 

I tend to believe that, at the current stage, this is not a programming question. You should rather talk to a medical expert who is familiar with those different cancer staging schemes and who can advise you if and how different schemes can be unified for further analysis. As soon as you can describe the rule clearly, many forum members will be able to help you implementing it in SAS.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1325 views
  • 2 likes
  • 4 in conversation