About TrueTears

TrueTears · ‎12-05-2019

I have the following dataset: data have; input year firm id; cards; 2008 28013 1003 2008 28013 1004 2008 28013 1007 2008 28013 1009 2008 28013 1010 2008 28013 1013 2008 28013 1053 2008 28013 1074 2008 28013 1075 2009 28013 1009 2009 28013 1010 2009 28013 1053 2009 28013 1074 2009 28013 1075 2009 28332 1004 2009 28332 1007 2009 28332 1823 ; run; For each "id", I have a list of adjacent "id" called id_adj in the following dataset: data list; input id id_adj use; cards; 1003 1009 1 1003 1010 1 1003 1085 1 1004 1007 0 1004 1009 0 1004 1099 0 1004 1100 0 1007 1004 0 1007 1823 1 1009 1003 1 1009 1004 0 1009 1010 0 1010 1003 1 1010 1009 0 1013 1053 1 1013 1074 1 1053 1013 1 1074 1013 1 1075 1743 1 1075 1744 1 1823 1007 1 ; run; I wish to create the following dataset with an additional variable "treat": data want; input year firm id treat; cards; 2008 28013 1003 0 2008 28013 1004 4 2008 28013 1007 5 2008 28013 1009 0 2008 28013 1010 0 2008 28013 1013 0 2008 28013 1053 0 2008 28013 1074 0 2008 28013 1075 5 2009 28013 1009 5 2009 28013 1010 5 2009 28013 1053 5 2009 28013 1074 5 2009 28013 1075 5 2009 28332 1004 4 2009 28332 1007 0 2009 28332 1823 0 ; run; where treat is defined as: Within each year/firm, for each id in "have", if use = 0 for all id_adj that corresponds to the id in "list", then treat = 4. Within each year/firm, for each id in "have", consider all of the id_adj that corresponds to the id with use = 1 in "list". If none of these id_adj appears as id in "have", then treat = 5. For example, consider year 2008, firm 28013 with id 1007. In "list", id 1007 corresponds to 1823 that has use=1. But 1823 doesn't appear in "have" as an id for year/firm 2008/28013. So treat = 5. For all other cases, treat = 0.

TrueTears · ‎12-03-2019

@Patrick Thank you! I tried your code, it seems to work for one particular year and firm_id pair but does not seem to work for multiple. Sorry that I did not add additional year and firm_id in my original post. Consider the following extended "have" dataset: data have; input year firm_id city_id operate action; cards; 2008 28013 1003 1 1 2008 28013 1004 1 1 2008 28013 1007 1 0 2008 28013 1009 1 1 2008 28013 1010 0 1 2008 28013 1013 1 0 2008 28013 1053 1 1 2008 28013 1074 1 0 2008 28013 1075 1 1 2009 28332 1003 1 0 2009 28332 1010 1 0 2009 28332 1013 0 1 2009 28332 1053 1 0 2009 28332 1074 1 1 2009 28332 1075 1 1 ; run; I added in additional year 2009 and another firm_id 28332. The "list" dataset always remains the same. Now "want" becomes: data want; input year firm_id city_id operate action action_adj; cards; 2008 28013 1003 1 1 2 2008 28013 1004 1 1 0 2008 28013 1007 1 0 0 2008 28013 1009 1 1 1 2008 28013 1010 0 1 3 2008 28013 1013 1 0 1 2008 28013 1053 1 1 0 2008 28013 1074 1 0 0 2008 28013 1075 1 1 0 2009 28332 1003 1 0 0 2009 28332 1010 1 0 0 2009 28332 1013 0 1 3 2009 28332 1053 1 0 2 2009 28332 1074 1 1 2 2009 28332 1075 1 1 0 ; run; However, your code gives the following (which is almost correct): data want; input year firm_id city_id operate action action_adj; cards; 2008 28013 1003 1 1 2 2008 28013 1004 1 1 0 2008 28013 1007 1 0 0 2008 28013 1009 1 1 1 2008 28013 1010 0 1 3 2008 28013 1013 1 0 1 2008 28013 1053 1 1 0 2008 28013 1074 1 0 0 2008 28013 1075 1 1 0 2009 28332 1003 1 0 2 2009 28332 1010 1 0 1 2009 28332 1013 0 1 3 2009 28332 1053 1 0 2 2009 28332 1074 1 1 2 2009 28332 1075 1 1 0 ; run; The first difference is for year 2009, firm_id 28332, and city_id 1003 which should have action_adj = 0 rather than action_adj = 2. This is because in the list, city_id 1003 corresponds to 1009, 1010, and 1085, but although 1010 has operate=0 for the year/firm_id pair 2008/28013, it has operate=1 for the year/firm_id pair 2009/28332. The second difference is for year 2009, firm_id 28332, and city_id 1010 which should have action_adj=0 rather than action_adj=1. This is because city_id 1010 corresponds to 1003 and although 1003 has action=1 for year/firm_id 2008/28013, it has action=0 for year/firm_id 2009/28332. Could you slightly alter your code so the conditions that I outlined in my original post is applied for every year/firm_id pair and not for the entire dataset? Thank you so very much.

TrueTears · ‎12-03-2019

@Patrick Thank you! I tried your code, it seems to work for one particular year and firm_id pair but does not seem to work for multiple. Sorry that I did not add additional year and firm_id in my original post. Consider the following extended "have" dataset: data have; input year firm_id city_id operate action; cards; 2008 28013 1003 1 1 2008 28013 1004 1 1 2008 28013 1007 1 0 2008 28013 1009 1 1 2008 28013 1010 0 1 2008 28013 1013 1 0 2008 28013 1053 1 1 2008 28013 1074 1 0 2008 28013 1075 1 1 2009 28332 1003 1 0 2009 28332 1010 1 0 2009 28332 1013 0 1 2009 28332 1053 1 0 2009 28332 1074 1 1 2009 28332 1075 1 1 ; I added in additional year 2009 and another firm_id 28332. The "list" dataset always remains the same. Now "want" becomes: data want; input year firm_id city_id operate action action_adj; cards; 2008 28013 1003 1 1 2 2008 28013 1004 1 1 0 2008 28013 1007 1 0 0 2008 28013 1009 1 1 1 2008 28013 1010 0 1 3 2008 28013 1013 1 0 1 2008 28013 1053 1 1 0 2008 28013 1074 1 0 0 2008 28013 1075 1 1 0 2009 28332 1003 1 0 0 2009 28332 1010 1 0 0 2009 28332 1013 0 1 3 2009 28332 1053 1 0 2 2009 28332 1074 1 1 2 2009 28332 1075 1 1 0 ; run; However, your code gives the following (which is almost correct): data want; input year firm_id city_id operate action action_adj; cards; 2008 28013 1003 1 1 2 2008 28013 1004 1 1 0 2008 28013 1007 1 0 0 2008 28013 1009 1 1 1 2008 28013 1010 0 1 3 2008 28013 1013 1 0 1 2008 28013 1053 1 1 0 2008 28013 1074 1 0 0 2008 28013 1075 1 1 0 2009 28332 1003 1 0 2 2009 28332 1010 1 0 1 2009 28332 1013 0 1 3 2009 28332 1053 1 0 2 2009 28332 1074 1 1 2 2009 28332 1075 1 1 0 ; run; The first difference is for year 2009, firm_id 28332, and city_id 1003 which should have action_adj = 0 rather than action_adj = 2. This is because in the list, city_id 1003 corresponds to 1009, 1010, and 1085, but although 1010 has operate=0 for the year/firm_id pair 2008/28013, it has operate=1 for the year/firm_id pair 2009/28332. The second difference is for year 2009, firm_id 28332, and city_id 1010 which should have action_adj=0 rather than action_adj=1. This is because city_id 1010 corresponds to 1003 and although 1003 has action=1 for year/firm_id 2008/28013, it has action=0 for year/firm_id 2009/28332. Could you slightly alter your code so the conditions that I outlined in my original post is applied for every year/firm_id pair and not for the entire dataset? Thank you so very much.

TrueTears · ‎12-03-2019

@Patrick Thank you! I tried your code, it seems to work for one particular year and firm_id pair but does not seem to work for multiple. Sorry that I did not add additional year and firm_id in my original post. Consider the following extended "have" dataset: data have; input year firm_id city_id operate action; cards; 2008 28013 1003 1 1 2008 28013 1004 1 1 2008 28013 1007 1 0 2008 28013 1009 1 1 2008 28013 1010 0 1 2008 28013 1013 1 0 2008 28013 1053 1 1 2008 28013 1074 1 0 2008 28013 1075 1 1 2009 28332 1003 1 0 2009 28332 1010 1 0 2009 28332 1013 0 1 2009 28332 1053 1 0 2009 28332 1074 1 1 2009 28332 1075 1 1 ; I added in additional year 2009 and another firm_id 28332. The "list" dataset always remains the same. Now "want" becomes: data want; input year firm_id city_id operate action action_adj; cards; 2008 28013 1003 1 1 2 2008 28013 1004 1 1 0 2008 28013 1007 1 0 0 2008 28013 1009 1 1 1 2008 28013 1010 0 1 3 2008 28013 1013 1 0 1 2008 28013 1053 1 1 0 2008 28013 1074 1 0 0 2008 28013 1075 1 1 0 2009 28332 1003 1 0 0 2009 28332 1010 1 0 0 2009 28332 1013 0 1 3 2009 28332 1053 1 0 2 2009 28332 1074 1 1 2 2009 28332 1075 1 1 0 ; run; However, your code gives the following (which is almost correct): data want; input year firm_id city_id operate action action_adj; cards; 2008 28013 1003 1 1 2 2008 28013 1004 1 1 0 2008 28013 1007 1 0 0 2008 28013 1009 1 1 1 2008 28013 1010 0 1 3 2008 28013 1013 1 0 1 2008 28013 1053 1 1 0 2008 28013 1074 1 0 0 2008 28013 1075 1 1 0 2009 28332 1003 1 0 2 2009 28332 1010 1 0 1 2009 28332 1013 0 1 3 2009 28332 1053 1 0 2 2009 28332 1074 1 1 2 2009 28332 1075 1 1 0 ; run; The first difference is for year 2009, firm_id 28332, and city_id 1003 which should have action_adj = 0 rather than action_adj = 2. This is because in the list, city_id 1003 corresponds to 1009, 1010, and 1085, but although 1010 has operate=0 for the year/firm_id pair 2008/28013, it has operate=1 for the year/firm_id pair 2009/28332. The second difference is for year 2009, firm_id 28332, and city_id 1010 which should have action_adj=0 rather than action_adj=1. This is because city_id 1010 corresponds to 1003 and although 1003 has action=1 for year/firm_id 2008/28013, it has action=0 for year/firm_id 2009/28332. Could you slightly alter your code so the conditions that I outlined in my original post is applied for every year/firm_id pair and not for the entire dataset? Thank you so very much.

TrueTears · ‎12-02-2019

I have a dataset similar to the following (but with more years and firm_id's) data have; input year firm_id city_id operate action; cards; 2008 28013 1003 1 1 2008 28013 1004 1 1 2008 28013 1007 1 0 2008 28013 1009 1 1 2008 28013 1010 0 1 2008 28013 1013 1 0 2008 28013 1053 1 1 2008 28013 1074 1 0 2008 28013 1075 1 1 ; run; I also have a list of each city_id and its adjacent city_id (city_id_adj) and a binary variable called "use": data list; input city_id city_id_adj use; cards; 1003 1009 1 1003 1010 1 1003 1085 1 1004 1007 0 1004 1009 0 1004 1099 0 1004 1100 0 1007 1004 0 1007 1823 1 1009 1003 1 1009 1004 0 1009 1010 0 1010 1003 1 1010 1009 0 1013 1053 1 1013 1074 1 1053 1013 1 1074 1013 1 1075 1743 1 1075 1744 1 ; run; I wish to produce the following dataset (I have more years and more firm_id's, so the code should work for multiple years and firm_id's) data want; input year firm_id city_id operate action action_adj; cards; 2008 28013 1003 1 1 2 2008 28013 1004 1 1 0 2008 28013 1007 1 0 0 2008 28013 1009 1 1 1 2008 28013 1010 0 1 3 2008 28013 1013 1 0 1 2008 28013 1053 1 1 0 2008 28013 1074 1 0 0 2008 28013 1075 1 1 0 ; run; To produce the above dataset, the rule for producing the variable action_adj is as follows. Fix the year and firm_id: If operate = 0, then action_adj = 3. An example is city_id 1010. For operate = 1: For each city_id, we check its corresponding city_id_adj in dataset "list" that has use=1. If in this set of city_id_adj, there is one that has operate = 0 in the dataset "have", then action_adj = 2. For example, city_id 1003 corresponds to 1009, 1010, and 1085 (all of these have use=1). Here 1010 has operate = 0, so action_adj = 2 for city_id 1003. If the list of city_id_adj does not have any that has operate=0, then we do the following: action_adj = 1 if there is at least one city_id_adj (with use=1) that has action = 1 in the dataset "have". For example, consider city_id 1009, in the dataset "list", it corresponds to city_id_adj 1003 only (since this is the only one with use = 1). In the dataset "have", 1003 has action=1, so we set action_adj = 1 for 1009. Another example is city_id 1013. In the dataset "list", 1013 corresponds to 1053 and 1074 (both have use=1). Although 1074 has action=0, but 1053 has action=1, so we set action_adj = 1 for 1013. action_adj = 0 for all other cases. Note all of the above is for a particular year and particular firm_id. This needs to be done for all firm_id and year pairs.

TrueTears · ‎11-26-2019

I have the following dataset (there are more years, but I am just showing the first two years): data have; input year ID shock; cards; 2000 1001 1 2000 1002 0 2000 1004 0 2000 1006 1 2000 1008 0 2000 1010 0 2000 1011 0 2001 1001 0 2001 1002 0 2001 1004 1 2001 1006 1 2001 1008 1 2001 1010 0 2001 1011 0 ; run; I have another dataset that lists "adjacent" IDs as follows: data list; input ID ID_adj; cards; 1001 1002 1001 1006 1002 1001 1004 1008 1004 1011 1006 1010 1008 1004 1008 1011 1010 1006 1010 1011 1011 1004 1011 1008 1011 1010 ; run; What I wish to do is create the following dataset: data want; input year ID shock treat $; cards; 2000 1001 1 T 2000 1002 0 A 2000 1004 0 N 2000 1006 1 T 2000 1008 0 N 2000 1010 0 A 2000 1011 0 N 2001 1001 0 A 2001 1002 0 N 2001 1004 1 T 2001 1006 1 T 2001 1008 1 T 2001 1010 0 A 2001 1011 0 A ; run; The variable treat is defined as follows: 1) If shock takes a value of 1, then treat = T. 2) If shock takes a value of 0, then we look across all of the ID_adj in the list dataset that corresponds to the ID. If the value of shock is 1 for at least one ID_adj in the same year, then treat = A. It's best to illustrate this with an example. In the original dataset 'have', look at ID 1002 in year 2000. In the dataset 'list', ID 1002 corresponds to 1001 and in year 2000, ID 1001 has shock = 1. Thus, we set treat = A. Similarly, in the original dataset 'have', look at ID 1010 in year 2000. In the dataset 'list', ID 1010 corresponds to 1006 and 1011. Although ID 1011 has shock = 0 in year 2000, ID 1006 has shock = 1 in year 2000, so treat for ID 1010 = A. 3) If shock takes a value of 0, then we look across all of the ID_adj in the list dataset that corresponds to the ID. If the value of shock is 0 for all ID_adj in the same year, then treat = N. Again, it's best to illustrate with an example: In the original dataset 'have', look at ID 1008 in year 2000. In the dataset 'list', ID 1008 corresponds to 1004 and 1011. Both ID 1004 and 1011 have shock = 0 in year 2000, so treat = N for ID 1008 in year 2000. Since the value of shock changes across different years, we need to do the above across all years, thereby producing the dataset 'want'.

TrueTears · ‎11-25-2019

Say I have the following excerpt dataset: data have; input year ID $ shock; cards; 2000 1001 1 2001 1001 1 2002 1001 1 2003 1001 1 2004 1001 1 2000 1003 1 2001 1003 1 2002 1003 1 2003 1003 1 2004 1003 1 2000 1007 1 2001 1007 1 2002 1007 1 2003 1007 1 2004 1007 1 2000 1010 1 2001 1010 1 2002 1010 1 2003 1010 1 2004 1010 1 ; run; The year is always from 2000 to 2004. Shock always takes a value of 1 in this dataset. Now I have a list of IDs as follows: data list; input ID; cards; 1001 1003 1004 1005 1007 1009 1010 ; run; As you can see, ID 1004, 1005, and 1009 are missing from the original dataset. What I wish to do is obtain the following dataset: data want; input year ID $ shock; cards; 2000 1001 1 2001 1001 1 2002 1001 1 2003 1001 1 2004 1001 1 2000 1003 1 2001 1003 1 2002 1003 1 2003 1003 1 2004 1003 1 2000 1004 0 2001 1004 0 2002 1004 0 2003 1004 0 2004 1004 0 2000 1005 0 2001 1005 0 2002 1005 0 2003 1005 0 2004 1005 0 2000 1007 1 2001 1007 1 2002 1007 1 2003 1007 1 2004 1007 1 2000 1009 0 2001 1009 0 2002 1009 0 2003 1009 0 2004 1009 0 2000 1010 1 2001 1010 1 2002 1010 1 2003 1010 1 2004 1010 1 ; run; So that the missing IDs are inserted into the original dataset from year 2000 to 2004 but shock takes a value of 0 for these missing IDs. Thank you.

TrueTears · ‎03-26-2017

Ahh, I think I realized my mistake, actually it was in another line: data somename; set somename(rename=( t_b_&input25=&input2 )); run; The actual name for the variable to be renamed is t_b_cr_6m5, but it's not recognizing the &input2 as cr_6m in the renaming command, how can I fix this?

TrueTears · ‎03-26-2017

I am a beginner SAS coder and I have the following (excerpt) SAS macro: %macro bts(input1, input2); proc sql noprint; select count(*) into :num_record from b.&input1; quit; %put &num_record; proc sql; create table c.datafile as select sqrt(&num_record)*(m_&input2/s_&input2+1/3*sum(x_&input2)/(&num_record*s_&input2**3)*(m_&input2/s_&input2)**2+1/(6*&num_record)*sum(x_&input2)/(&num_record*s_&input2**3)) as &input2 from (select mean(&input2) as m_&input2, std(&input2) as s_&input2, (&input2-calculated m_&input2)**3 as x_&input2 from b.&input1); quit; %mend; %bts(inv_per, cr_6m); The error that I get is: WARNING: Apparent symbolic reference M_ not resolved. &m_cr_6m The variable cr_6m just consists of numbers and I think the reason I'm getting an error is because of the underscores. How can I fix this?

TrueTears · ‎01-10-2016

I included them as a picture in the post itself, maybe it doesn't show for you for some reason. I have attached both pictures as an attachment. Basically, the trouble I am having is how to code the formula and also how to output the results. Thanks for your help.

TrueTears · ‎01-10-2016

I have a datafile called 'original' and in it contains 4 variables, call them a, b, c, d, with n observations each. I then use proc surveyselect to draw 1000 resamples from the 'original' dataset with sample size n_b = n/4, the code is as follows: proc sql noprint; select ceil(count(*)/4) into :record_count from original; quit; %put &record_count; %let rep = 1000; proc surveyselect data= original out=bootsample seed = 1234 method = urs sampsize=&record_count outhits rep = &rep; run; ods listing close; This produces a datafile named 'bootsample' which contains 1000 samples with sample size n_b of each variable (a, b, c, and d) from the 'original' dataset. Each observation's replication ID is given by the variable "Replicate" (ranging from 1 to 1000). What I need to do is this:Take Replicate = 1 (i.e., the first replication sample) and the variable a as an example. I want to calculate the following value of t: (if the picture below doesn't show, please see attachment of the picture titled "formula") where mean(a) is the sample average of the variable a for replication sample 1, std(a) is the sample standard deviation of the variable a for replication sample 1, a_i represents each individual observation of the variable a for replication sample 1. Then, I want to repeat the above procedure and calculate the value of t for all 1000 replication samples and all four variables: a, b, c, and d. I want to store the final result in a datafile called "result" that has 4 variables called a_t, b_t, c_t, and d_t (i.e., 4 columns) and the 1000 values of t of each variable in each row. So, graphically, a datafile structured like this: (if the picture below doesn't show, please see attachment of the picture titled "result") Can anyone show me a template code that can achieve what I described above? I'm thinking maybe proc sql can do the trick, but I'm quite new to SAS and still don't really know the syntax very well. Thanks.

TrueTears · ‎01-05-2016

Thanks, this is what I was thinking of, however I am quite new to SAS. Any chance you can provide a skeleton template code for me to edit? Cheers.

TrueTears · ‎01-05-2016

But doesn't samprate=1 create a sample size equal to the size of the INPUT datafile? In my case, the input datafile is called "A", but I want the size to be equal to that of a datafile called "C", which is NOT the input data; they have different sizes. How can I achieve this?

TrueTears · ‎01-05-2016

I am using proc surveyselect for unrestricted random sampling, my code is as follows: proc surveyselect data=A out=B seed = 1234 method = urs sampsize=237 /* This is the number of rows of a dataset called C, which has already been created*/ outhits rep = 1; run; ods listing close; My input datafile is called A and output datafile is called B. I have another dataset called C (which is different from the input file A) and it has 237 rows, i.e., 237 observations for each variable. Instead of manually inputting the number 237, I want the sampsize to be equal to the number of rows of dataset C (since the number of rows of this dataset will change depending on the data I use). How can I do this? Thanks

TrueTears · ‎02-27-2014

Thanks Vishal, that works well. I just realised there is one more remaining problem which I'm not too sure how to code. Using my original dataset (in my first post), an excerpt of the output (using the code in my third post) can be found https://www.dropbox.com/s/9f7fdrq4rt72lwt/output.xlsx As can be seen row 2 to row 53 presents the correlation matrix for the day 1 Apr 2008. However, a problem arises for the correlation matrix for the day 1 Apr 2009: there are missing values for correlation coefficients for ALPHA and its pairs. This is because if one looks at the datafile, the values for ALPHA from 1 Apr 2008 to 1 Apr 2009 are all zero, hence causing a division by zero when SAS tries to calculate the correlation coefficient. This situation happens with a few other data values too, for example, HSBC also has all values as 0 from 1 Apr 08 to 1 Apr 09. To resolve this issue, I was wondering how the above code can be modified so that in cases where this situation happens (i.e., all values are 0 between two certain dates), then the correlation between the two pairs of data values are simply calculated using the WHOLE sample period. E.g., the correlation between ALPHA and AUT is missing on 1 Apr 09, then this correlation should be calculated using the values from 1 JAN 2008 to 31 DEC 2013, rather than using the values from 1 Apr 08 to 1 Apr 09. Thank you.

Online Status	Offline
Date Last Visited	‎07-28-2024 03:36 PM

Re: Filling in observations from a different dataset

Filling in observations from a different dataset

Re: Matching observations with closest scores

Matching observations with closest scores

Filling in observations to make a balanced panel

Re: How to calculate moving average with gaps in years

Re: How to calculate moving average with gaps in years

How to calculate moving average with gaps in years

Re: Changing the value of a variable for a particular ID

Changing the value of a variable for a particular ID

Assign group numbers to nearby ID's

Re: Creating a variable based on another dataset

Re: Calculating expression of a formula for each resample

Re: Dynamic sampsize in proc surveyselect

Re: Calculating expression of a formula for each resample

How to create a variable given another dataset

Re: Creating a variable based on another dataset

Re: Creating a variable based on another dataset

Re: Creating a variable based on another dataset

Creating a variable based on another dataset

Creating a new variable based on corresponding IDs from another datase...

Insert missing ID from list

Re: Macro variables with underscores

Macro variables with underscores

Re: Calculating expression of a formula for each resample

Calculating expression of a formula for each resample

Re: Dynamic sampsize in proc surveyselect

Re: Dynamic sampsize in proc surveyselect

Dynamic sampsize in proc surveyselect

Re: Calculating rolling correlations and output each correlation matri...