About elbarto

elbarto · ‎04-03-2022

I have a dataset called low_2002 as follows: DATA low_2002; input county_id ammonia nitrogen voc; DATALINES; 1001 568.76 2329.47 6695.85 1003 0.24 0.00 50.41 1005 10.73 347.76 325.23 1007 0.00 59.06 77.60 1009 4.99 0.00 0.00 ; RUN; and another dataset called TRI as follows: DATA TRI; input county_id voc; DATALINES; 1001 432.47 1003 0.00 1005 3.22 1007 0.00 1009 1.52 ; RUN; What I would like to do is replace each county_id's VOC value in low_2002 with the corresponding value in TRI, sequentially, while keeping everything else exactly the same. This will produce 5 datasets. I want to name them low_2002_TRI_1, low_2002_TRI_2, ..., low_2002_TRI_5. For example, low_2002_TRI_1 will be: (county_id 1001 has the VOC value of 6695.85 replaced with 432.47 , while everything else stays the same). DATA low_2002_TRI_1; input county_id ammonia nitrogen voc; DATALINES; 1001 568.76 2329.47 432.47 1003 0.24 0.00 50.41 1005 10.73 347.76 325.23 1007 0.00 59.06 77.60 1009 4.99 0.00 0.00 ; RUN; low_2002_TRI_2 will be: DATA low_2002_TRI_2; input county_id ammonia nitrogen voc; DATALINES; 1001 568.76 2329.47 6695.85 1003 0.24 0.00 0.00 1005 10.73 347.76 325.23 1007 0.00 59.06 77.60 1009 4.99 0.00 0.00 ; RUN; and all the way until low_2002_TRI_5: DATA low_2002_TRI_5; input county_id ammonia nitrogen voc; DATALINES; 1001 568.76 2329.47 6695.85 1003 0.24 0.00 50.41 1005 10.73 347.76 325.23 1007 0.00 59.06 77.60 1009 4.99 0.00 1.52 ; RUN; Then, I would like to save each of the low_2002_TRI_1, low_2002_TRI_2, ..., low_2002_TRI_5 as csv files with no variable names and the county_id variable dropped, so that low_2002_TRI_1.csv looks like: 568.76 2329.47 432.47 0.24 0.00 50.41 10.73 347.76 325.23 0.00 59.06 77.60 4.99 0.00 0.00 and low_2002_TRI_2 looks like: 568.76 2329.47 6695.85 0.24 0.00 0.00 10.73 347.76 325.23 0.00 59.06 77.60 4.99 0.00 0.00 The above is just an example of 5 county_id's, but in reality, I will be applying this procedure to a fuller dataset with approximately 3000 county_id's.

elbarto · ‎10-02-2021

I have the following dataset: DATA have; input year place_id firm_id shock; DATALINES; 2001 34023 28013 0 2002 34023 28013 1 2003 34023 28013 1 2004 34023 28013 1 2005 34023 28013 0 2006 34023 28013 0 2007 34023 28013 0 2008 34023 28013 0 2009 34023 28013 1 2010 34023 28013 1 1992 46085 28013 1 1993 46085 28013 0 1994 46085 28013 1 1995 46085 28013 1 1996 46085 28013 0 1997 46085 28013 1 1998 46085 28013 0 1999 46085 28013 0 2000 46085 28013 0 2001 46085 28013 1 2002 46085 28013 0 ; RUN; I want to create another variable called flag to arrive at the following dataset: DATA want; input year place_id firm_id shock flag; DATALINES; 2001 34023 28013 0 0 2002 34023 28013 1 1 2003 34023 28013 1 1 2004 34023 28013 1 1 2005 34023 28013 0 1 2006 34023 28013 0 1 2007 34023 28013 0 0 2008 34023 28013 0 0 2009 34023 28013 1 1 2010 34023 28013 1 1 1992 46085 28013 1 1 1993 46085 28013 0 1 1994 46085 28013 1 1 1995 46085 28013 1 1 1996 46085 28013 0 1 1997 46085 28013 1 1 1998 46085 28013 0 1 1999 46085 28013 0 1 2000 46085 28013 0 0 2001 46085 28013 1 1 2002 46085 28013 0 1 ; RUN; The rule is that if for a given place_id and firm_id, if in that year shock=1, then flag=1 for the same year as well as the following two years. For example, for year=2004, place_id=34023 and firm_id=28013, we have shock=1, so flag=1 for year=2004 and also takes on the value of 1 for years 2005 and 2006. Similarly, for year=2001, place_id=46085 and firm_id=28013, we have shock=1, so flag=1 for year 2001 and equals 1 for year 2002 as well. Year 2003 does not exist for this year-place_id-firm_id combination so there is no need to create an extra row for year 2003. Furthermore, could the code be general so that the following N years all take a value of 1 for flag. The example shown here is for N=2.

elbarto · ‎08-17-2021

Opps, I had made a typo, I fixed it now.

elbarto · ‎08-17-2021

I have dataset similar to the following: DATA have; input id day shock status ; DATALINES; 1001 33 . 1 1001 34 1 0 1001 35 . 0 1001 36 . 0 1001 37 . 0 1001 38 . 1 1001 39 . 1 1001 40 . 1 1001 41 . 0 1001 42 . 0 1005 55 . 1 1005 56 . 1 1005 57 . 1 1005 58 . 1 1005 59 1 0 1005 60 . 1 1005 61 . 1 ; RUN; For each id, I want to count the number of times that status takes on a value of zero starting from the day where shock=1 until the next time that status=1. Note that in the data, when shock=1, status always takes on a value of 0. The resultant data should be: DATA want; input id count ; DATALINES; 1001 4 1005 1 ; RUN; For example, for id=1001, when shock=1 on day 34, status=0 until day 38 when status becomes 1. There are 4 days on which status=0, so count=4 in the "want" dataset. Note that on day 41, status also equals to 0 but there is no value of 1 for shock, so these zeros are not counted.

elbarto · ‎08-08-2021

Thanks Tom! This is a great solution too.

elbarto · ‎08-08-2021

I have a dataset (excerpt shown below): DATA have; input id year Latitude Longitude ; DATALINES; 1049 1995 34.289001 -85.970065 1049 1999 34.289001 -85.970065 1073 1990 33.386389 -86.816667 1073 1995 33.331111 -87.003611 1073 1995 33.386389 -86.816667 1073 1995 33.578333 -86.773889 1073 1996 33.331111 -87.003611 1073 1996 33.386389 -86.816667 1073 1996 33.704722 -86.669167 1073 1996 33.578333 -86.773889 1073 1998 33.485556 -86.915 1073 1998 33.386389 -86.816667 1073 1999 33.331111 -87.003611 1073 1999 33.386389 -86.816667 ; RUN; I want to transpose the dataset to be as follows: DATA want; input id year Latitude1 Longitude1 Latitude2 Longitude2 Latitude3 Longitude3 Latitude4 Longitude4 ; DATALINES; 1049 1995 34.289001 -85.970065 . . . . . . 1049 1999 34.289001 -85.970065 . . . . . . 1073 1990 33.386389 -86.816667 . . . . . . 1073 1995 33.331111 -87.003611 33.386389 -86.816667 33.578333 -86.773889 . . 1073 1996 33.331111 -87.003611 33.386389 -86.816667 33.704722 -86.669167 33.578333 -86.773889 1073 1998 33.485556 -86.915 33.386389 -86.816667 . . . . 1073 1999 33.331111 -87.003611 33.386389 -86.816667 . . . . ; RUN; What I've tried is using the following code: proc transpose data=have out=want; by id year; var latitude longitude; run; However, the result is not quite what I wanted. In particular, the full dataset could have many latitude/longitude pairs (much more than 4 shown in the example), so I would like the naming of the columns to be from latitude1, longitude1, latitude2, longitude2, ..., up until the last pair.

elbarto · ‎05-25-2021

DATA have; input id yq ; DATALINES; 1001 23 1001 24 1001 25 1001 26 1001 27 1001 28 1005 102 1005 103 1005 104 1005 105 1005 106 1005 107 1005 108 1005 109 1005 110 1005 111 1005 112 1005 113 1005 114 2001 43 2001 44 2001 45 2001 46 2001 47 2001 48 ; RUN; I have the above dataset, now I have another dataset that contains a list like this (there is never a repeat in this dataset): DATA list; input event_yq ; DATALINES; 57 106 139 ; RUN; I want to create the following dataset where I want to fill in the columns event_yq1 event_yq2 event_yq3 (until however many there values there are in event_yq): DATA want; input id yq event_yq1 event_yq2 event_yq3 ; DATALINES; 1001 23 57 106 139 1001 24 57 106 139 1001 25 57 106 139 1001 26 57 106 139 1001 27 57 106 139 1001 28 57 106 139 1005 102 57 106 139 1005 103 57 106 139 1005 104 57 106 139 1005 105 57 106 139 1005 106 57 106 139 1005 107 57 106 139 1005 108 57 106 139 1005 109 57 106 139 1005 110 57 106 139 1005 111 57 106 139 1005 112 57 106 139 1005 113 57 106 139 1005 114 57 106 139 2001 43 57 106 139 2001 44 57 106 139 2001 45 57 106 139 2001 46 57 106 139 2001 47 57 106 139 2001 48 57 106 139 ; RUN; I have tried transposing the list dataset and then using a merge statement but I can only get one row filled out and the rest are missing. Any help would be appreciated.

elbarto · ‎05-20-2021

I have a dataset that looks as follows: DATA have; input dummy1 dummy2 event_year firm_id variable ; DATALINES; 0 0 1991 1034 5.7991 0 0 1991 1365 8.5963 0 0 1991 1789 7.9652 0 0 1991 1865 5.6145 0 0 2004 1034 3.6768 0 0 2004 1365 10.1621 0 0 2004 2282 5.4541 0 0 2004 2812 6.6856 0 0 2004 3895 8.2246 0 0 2004 5404 7.6025 0 0 2004 6109 3.2838 0 0 2004 7086 7.5047 0 1 1991 1372 5.2026 0 1 1991 1640 3.692 0 1 1991 3093 5.0352 0 1 1991 3840 4.6172 0 1 1991 5594 2.9139 0 1 1991 5973 3.1315 0 1 2004 1372 6.1267 0 1 2004 1640 4.8229 0 1 2004 1926 6.7382 0 1 2004 2034 7.2528 0 1 2004 2787 7.89 0 1 2004 3607 7.8935 0 1 2004 4145 4.2265 1 0 1991 1004 5.9473 1 0 1991 1078 8.6867 1 0 1991 1598 6.4022 1 0 1991 1609 10.3318 1 0 1991 1613 5.3902 1 0 1991 1651 5.8021 1 0 1991 1686 5.555 1 0 2004 1609 7.439 1 0 2004 1613 10.0747 1 0 2004 1651 8.0287 1 0 2004 1686 12.6915 1 1 1991 1036 3.7112 1 1 1991 1327 4.0193 1 1 1991 1397 4.5393 1 1 1991 1585 3.3894 1 1 1991 1608 7.98 1 1 1991 1632 6.1909 1 1 1991 1659 5.4968 1 1 2004 1439 4.1855 1 1 2004 1478 10.2339 1 1 2004 1659 6.0975 1 1 2004 1689 5.9035 ; RUN; There are two binary variables: dummy1 and dummy2 that splits up the sample. I want to test for a difference in the difference in means as follows. First, I run the following code: proc surveymeans data=have ; cluster firm_id event_year ; var variable; domain dummy1*dummy2 / diffmeans; ods output DomainDiffs=sort_domaindiff Domain=sort_domain; run; Using the above code, I can produce the following table: For example, 7.849927 is the mean of (1, 0) where the notation is (dummy1, dummy2). The value 2.236536 is the difference in means between (1, 0) and (1, 1) groups, and it's significance can be tested by reading off the p-value from the dataset "Sort_domaindiff". However, what I am interested in testing is the significance of the value 0.87189 which is the difference in the difference of means, i.e., the difference in means between (1, 0)-(1,1) and (0, 0)-(0,1). However, I do not know how to achieve this in proc surveymeans as there is no output for this. Is there a way to achieve this in proc surveymeans? If not, is there another method which can I use to find the significance of this difference in difference of means? Note that I need to cluster the standard errors by firm_id event_year, which is why I am using proc surveymeans. If another method is used, I also need to ensure standard errors are clustered by these two variables.

elbarto · ‎03-14-2021

I have a dataset, an excerpt of it is given below: DATA have; id year output ; DATALINES; 1001 1987 . 1001 1988 . 1001 1989 . 1001 1990 5 1001 1991 . 1001 1992 . 1001 1993 . 1001 1994 34 1001 1995 22 1001 1996 33 1001 1997 15 1001 1998 . 1001 1999 . 1001 2000 . 1001 2001 23 1001 2002 . 1001 2003 45 1001 2004 23 1001 2005 12 1001 2006 . 1001 2007 . 1001 2008 . 1001 2009 . 1001 2010 2 1001 2011 . 1001 2012 . 1001 2013 56 1001 2014 . 1001 2015 . 1001 2016 . 1001 2017 . 1001 2018 23 1001 2019 . 1002 1987 34 1002 1988 . 1002 1989 12 1002 1990 13 1002 1991 55 1002 1992 32 1002 1993 . 1002 1994 . 1002 1995 54 1002 1996 64 1002 1997 . 1002 1998 . 1002 1999 23 1002 2000 . 1002 2001 . 1002 2002 . 1002 2003 . 1002 2004 64 1002 2005 12 1002 2006 . 1002 2007 . 1002 2008 . 1002 2009 . 1002 2010 3 1002 2011 . 1002 2012 . 1002 2013 . 1002 2014 . 1002 2015 . 1002 2016 . 1002 2017 4 1002 2018 . 1002 2019 12 ; RUN; Each id always has year going from 1987 to 2019 (no gaps). I want to "lag" the output variable to produce the following dataset DATA want; id year output output_prev; DATALINES; 1001 1987 . . 1001 1988 . . 1001 1989 . . 1001 1990 5 . 1001 1991 . 5 1001 1992 . 5 1001 1993 . 5 1001 1994 34 5 1001 1995 22 34 1001 1996 33 22 1001 1997 15 33 1001 1998 . 15 1001 1999 . 15 1001 2000 . 15 1001 2001 23 15 1001 2002 . 23 1001 2003 45 23 1001 2004 23 45 1001 2005 12 23 1001 2006 . 12 1001 2007 . 12 1001 2008 . 12 1001 2009 . 12 1001 2010 2 12 1001 2011 . 2 1001 2012 . 2 1001 2013 56 2 1001 2014 . 56 1001 2015 . 56 1001 2016 . 56 1001 2017 . 56 1001 2018 23 56 1001 2019 . 23 1002 1987 34 . 1002 1988 . 34 1002 1989 12 34 1002 1990 13 12 1002 1991 55 13 1002 1992 32 55 1002 1993 . 32 1002 1994 . 32 1002 1995 54 32 1002 1996 64 54 1002 1997 . 64 1002 1998 . 64 1002 1999 23 64 1002 2000 . 23 1002 2001 . 23 1002 2002 . 23 1002 2003 . 23 1002 2004 64 23 1002 2005 12 64 1002 2006 . 12 1002 2007 . 12 1002 2008 . 12 1002 2009 . 12 1002 2010 3 12 1002 2011 . 3 1002 2012 . 3 1002 2013 . 3 1002 2014 . 3 1002 2015 . 3 1002 2016 . 3 1002 2017 4 3 1002 2018 . 4 1002 2019 12 4 ; RUN; It seems simple but I can't seem to do it. I want to lag the output variable so that each year for a given id always takes the "latest" value of output. As an example, for id=1001, year=1991, output_prev=5 because that was the latest value of output (last updated in year 1990). This continues until we reach year 1995, where output_prev changes to 34 because the latest value of output is 34 (last updated in year 1994).

elbarto · ‎03-11-2021

I have the following dataset DATA have; input period key_id treat ps_score ; DATALINES; 59 1004 0 0.0701784726 59 1078 0 0.1496074832 59 1209 0 0.1325333248 59 1300 0 0.1555762808 59 1327 0 0.0469939511 59 1523 1 0.0531098455 59 1854 1 0.0411252176 61 1004 0 0.1085249132 61 1008 0 0.1531924709 61 1078 0 0.0963164678 61 1102 0 0.0962037147 61 1300 0 0.0684734650 61 2402 0 0.1030826279 61 3023 1 0.0242288199 61 3044 1 0.0848487298 61 4033 1 0.0024468050 ; RUN; First, I wish to do one to one matching as follows.: Match each key_id that has treat = 1 to a key_id with treat = 0 in the same period group where their absolute difference in ps_score is the smallest. This produces the following dataset: DATA want1; input period key_id matched_id; DATALINES; 59 1523 1327 59 1854 1327 61 3023 1300 61 3044 1102 61 4033 1300 ; RUN; For example, in period=59, for key_id=1523 (which has treat=1), the absolute difference in ps_score with key_id=1327 (which has treat=0 and is in the same period) is 0.006115894, which is the smallest. So in the want1 dataset, the matched_id for key_id=1523 is 1327. Other entries follow the same rule. What I want to do next is 2-1 matching, so that each treat=1 is matched to the closest two key_id with treat=0 in the same period with the closest ps_score. The resultant dataset is as follows: DATA want2; input period key_id matched_id; DATALINES; 59 1523 1327 59 1523 1004 59 1854 1327 59 1854 1004 61 3023 1300 61 3023 1102 61 3044 1102 61 3044 1078 61 4033 1300 61 4033 1102 ; RUN; For example, for key_id=1523 in period=59, after it is matched to 1327, we look for the key_id with treat=0 that has the next smallest absolute difference in terms of ps_score. This corresponds to key_id=1004 (the absolute difference in ps_score is 0.017068627). So, in this 2-1 match, 1523 is matched with two observations. Is it possible to write a general code for this so that I can choose N:1 match (following the same rules as above) where I have illustrated the cases for N=1, 2 above.

elbarto · ‎12-14-2020

Hello everyone, I have an excerpt of my data as the following: DATA have ; input id year action shock; DATALINES; 1055 1981 0 . 1055 1982 0 . 1055 1983 0 . 1055 1984 0 . 1055 1985 1 1 1055 1986 1 . 1055 1987 1 . 1055 1988 1 . 1055 1989 0 . 1055 1990 0 . 1055 1991 0 . 1055 1992 0 . 1055 1993 0 . 1085 1981 0 . 1085 1982 1 1 1085 1983 0 . 1085 1984 0 . 1085 1985 0 . 1085 1986 0 . 1085 1987 0 . 1085 1988 0 . 1085 1989 0 . 1085 1990 0 . 1085 1991 1 1 1085 1992 1 . 1085 1993 1 . 1212 1981 0 . 1212 1982 0 . 1212 1983 1 1 1212 1984 1 . 1212 1985 1 . 1212 1986 0 . 1212 1987 0 . 1212 1988 1 1 1212 1989 1 . 1212 1990 1 . 1212 1991 0 . 1212 1992 0 . 1212 1993 0 . 1842 1981 0 . 1842 1982 0 . 1842 1983 0 . 1842 1984 0 . 1842 1985 0 . 1842 1986 0 . 1842 1987 0 . 1842 1988 0 . 1842 1989 0 . 1842 1990 0 . 1842 1991 0 . 1842 1992 0 . 1842 1993 0 . 2913 1981 0 . 2913 1982 0 . 2913 1983 0 . 2913 1984 0 . 2913 1985 1 1 2913 1986 1 . 2913 1987 1 . 2913 1988 0 . 2913 1989 0 . 2913 1990 0 . 2913 1991 1 1 2913 1992 1 . 2913 1993 1 . ; RUN; I want to create the following dataset: DATA want ; input id event_year treat; DATALINES; 1055 1985 1 1085 1991 1 1212 1983 1 1212 1988 1 2913 1985 1 2913 1991 1 1085 1985 0 1842 1985 0 1055 1991 0 1842 1991 0 1842 1983 0 1085 1988 0 1842 1988 0 ; RUN; The want dataset is created as follows: For each id, consider only shock = 1. If the corresponding value of action equals 1 for the corresponding year and the following two years and equals 0 for the previous two years, then in the want dataset, enter an observation for the id, the corresponding year (named as event_year) and set treat = 1. As an example, consider id = 1055, year = 1985 in have. Here shock = 1, and action = 0 for years 1983 and 1984 (the previous two years), and equals to 1 for years 1985 (the current year corresponding to shock = 1), 1986 and 1987 (the following two years), so in the want dataset, there is an entry for id = 1055, event_year = 1985 with treat = 1. Once all the rows with treat = 1 are created, then to create the rows with treat = 0, we do the following: For each treat = 1, we look at all the id's in the have dataset with year equal to the event_year but with action = 0 in that given year. If action also equals to 0 for the two years before and two years after event_year, then enter an observation for the id, the corresponding event_year and set treat = 0 in want. As an example, consider id = 1055, event_year = 1985, treat = 1 in want. In the have dataset, id = 1085 has action = 0 for year = 1985. Action also takes the value of 0 two years before and two years after 1985, so create the observation id = 1085, event_year = 1985 and treat = 0 in want. Similarly, in the have dataset, id = 1842 also has action = 0 for year = 1985 as well as for the two years before and two years after. So create the observation id = 1842, event_year = 1985 and treat = 0 in want. There is no need to output duplicates in want, so for example, for id = 1055, event_year = 1985 and treat = 1, we already have rows for id = 1085, event_year = 1985, treat = 0, and id = 1842, event_year = 1985, treat = 0. So when we look at id = 2913, event_year = 1985 and treat = 1, this corresponds to exactly the same two rows, so there is no need to output this again in want. Also, can the code be made flexible so that in the above rule, instead of being two years before and two years after, it can be any arbitrary window around the event_year? For example, one year before to two years after, or 3 years before to 3 years after.

elbarto · ‎12-13-2020

Hi everyone, I am trying to create a new variable based on another column. Here is an example of the HAVE data. DATA have ; input section $ id year action type $; DATALINES; first 8069 2002 0 . first 8069 2003 0 . first 8069 2004 0 ann1 first 8069 2005 1 . first 8069 2006 1 . first 8069 2007 1 . first 8234 1988 0 . first 8234 1989 0 . first 8234 1990 0 . first 8234 1991 1 ann1 first 8234 1992 0 . first 8234 1993 0 . first 8234 1994 0 ann2 first 8234 1995 1 . first 8234 1996 1 . first 8234 1997 1 . first 8234 1998 0 . first 8234 1999 0 . second 1032 2011 1 ann1 second 1032 2012 0 . second 1032 2013 0 . second 1032 2014 0 . second 8069 2005 0 . second 8069 2006 0 ann1 second 8069 2007 0 . second 8069 2008 0 ann2 second 8069 2009 1 . second 8069 2010 1 . second 8234 1999 0 . second 8234 2000 0 . second 8234 2001 0 ann1 ; RUN; Here is an example of the WANT data: DATA want ; input section $ id year action type $ event; DATALINES; first 8069 2002 0 . . first 8069 2003 0 . . first 8069 2004 0 ann1 . first 8069 2005 1 . 1 first 8069 2006 1 . . first 8069 2007 1 . . first 8234 1988 0 . . first 8234 1989 0 . . first 8234 1990 0 . . first 8234 1991 1 ann1 1 first 8234 1992 0 . . first 8234 1993 0 . . first 8234 1994 0 ann2 . first 8234 1995 1 . 1 first 8234 1996 1 . . first 8234 1997 1 . . first 8234 1998 0 . . first 8234 1999 0 . . second 1032 2011 1 ann1 1 second 1032 2012 0 . . second 1032 2013 0 . . second 1032 2014 0 . . second 8069 2005 0 . . second 8069 2006 0 ann1 . second 8069 2007 0 . . second 8069 2008 0 ann2 . second 8069 2009 1 . 1 second 8069 2010 1 . . second 8234 1999 0 . . second 8234 2000 0 . . second 8234 2001 0 ann1 . ; RUN; What I want to do is as follows. For each section and id, look at the "type" variable. If it is a nonmissing value, then look at the corresponding "action" variable in that year. If this action value is: i) 0 and the following year's action value is 1, then set event = 1 for the following year; ii) 0 and the following year's action value is 0, then set event = . for the following year; iii) 1, then set event = 1 for the same year, regardless of what the value of action is in the following year. All other event values are set to missing. Also, in cases where there is a nonmissing value for type and action = 0 for that year, but there is no data for the following year, then simply set event = . for the same year (see the last row in want). Note: action only ever takes on the value 0 or 1. Type also only ever takes on the values ann1 or ann2, but this does not really matter, as long as it is a nonmissing value then the above rules apply.

elbarto · ‎01-07-2020

I am quite a beginner at SAS. I am not sure how to add additional rows based on the complicated conditions as in my post. Any help/hints would be much appreciated.

elbarto · ‎01-07-2020

I have the following dataset: data have; input year firm_id location_id action action_amount operate new_entry ; cards; 2013 28013 6085 1 10000 0 0 2015 28013 6085 1 12000 0 0 2015 28013 29189 1 10000 0 0 2016 28013 34019 1 5000 1 1 2017 28013 34019 0 0 1 2 2011 120609 9003 1 7000 0 0 2012 120609 9003 0 0 1 1 2013 120609 9003 1 5000 1 2 2012 247908 23001 1 9000 0 0 2013 247908 23001 1 8000 0 0 2014 247908 23001 1 8500 1 1 2015 247908 23001 0 0 1 2 2003 356123 1001 0 0 0 0 2004 356123 1001 0 0 0 0 2009 356123 1001 1 9800 1 1 ; run; I want to add additional rows and two new variables called "pre_action" and "pre_action_amount" to obtain the following dataset: data want; input year firm_id location_id action action_amount operate new_entry pre_action pre_action_amount ; cards; 2013 28013 6085 1 10000 0 0 . . 2014 28013 6085 0 0 0 0 1 10000 2015 28013 6085 1 12000 0 0 . . 2016 28013 6085 0 0 0 0 1 12000 2015 28013 29189 1 6500 0 0 . . 2016 28013 29189 0 0 0 0 1 6500 2016 28013 34019 1 5000 1 1 0 0 2017 28013 34019 0 0 1 2 . . 2011 120609 9003 1 7000 0 0 . . 2012 120609 9003 0 0 1 1 1 7000 2013 120609 9003 1 5000 1 2 . . 2012 247908 23001 1 9000 0 0 . . 2013 247908 23001 1 8000 0 0 1 9000 2014 247908 23001 1 8500 1 1 1 8000 2015 247908 23001 0 0 1 2 . . 2003 356123 1001 0 0 0 0 . . 2004 356123 1001 0 0 0 0 0 0 2005 356123 1001 0 0 0 0 0 0 2009 356123 1001 1 9800 1 1 0 0 ; run; The rules are as follows: 1) First, consider only the rows with operate = 0. For each firm_id and location_id pair, if in the following year there is no row with the same firm_id and location_id, then create a new row with the following year and same firm_id and location_id pair. The variables action, action_amount, operate, and new_entry are all set to 0, while pre_action and pre_action_amount is set to be the value of action and action_amount in the previous year. Example: In year 2013, for the firm_id/location_id pair 28013/6085, we have operate = 0. But in 2014, there are no observations for this firm_id/location_id pair. So we set action, action_amount, operate, and new_entry to be 0 and pre_action=1 and pre_action_amount=10000 which are the values for action and action_amount in 2013. For each firm_id and location_id pair, if in the following year there is a row with the same firm_id and location_id, then simply set pre_action and pre_action_amount to be the value of action and action_amount in the previous year. Example: In year 2011 for firm_id/location_id 120609/9003, we have operate=0. But in the next year 2012, there is a row with this firm_id/location_id pair. So we set pre_action=1 and pre_action_amount=7000 which are the values for action and action_amount in 2011. Another example is in year 2003, for the firm_id/location_id 356123/1001. 2) Now consider the rows with new_entry=1 that do not yet have a value of pre_action and pre_action_amount. Set both pre_action and pre_action_amount to be 0. 3) All other values of pre_action and pre_action_amount are empty.

elbarto · ‎01-07-2020

I have the following dataset: data have; input year firm_id location_id action action_amount operate new_entry ; cards; 2013 28013 6085 1 10000 0 0 2015 28013 6085 1 12000 0 0 2015 28013 29189 1 10000 0 0 2016 28013 34019 1 5000 1 1 2017 28013 34019 0 0 1 2 2011 120609 9003 1 7000 0 0 2012 120609 9003 0 0 1 1 2013 120609 9003 1 5000 1 2 2012 247908 23001 1 9000 0 0 2013 247908 23001 1 8000 0 0 2014 247908 23001 1 8500 1 1 2015 247908 23001 0 0 1 2 2003 356123 1001 0 0 0 0 2004 356123 1001 0 0 0 0 2009 356123 1001 1 9800 1 1 ; run; I want to add additional rows and two new variables called "pre_action" and "pre_action_amount" to obtain the following dataset: data want; input year firm_id location_id action action_amount operate new_entry pre_action pre_action_amount ; cards; 2013 28013 6085 1 10000 0 0 . . 2014 28013 6085 0 0 0 0 1 10000 2015 28013 6085 1 12000 0 0 . . 2016 28013 6085 0 0 0 0 1 12000 2015 28013 29189 1 6500 0 0 . . 2016 28013 29189 0 0 0 0 1 6500 2016 28013 34019 1 5000 1 1 0 0 2017 28013 34019 0 0 1 2 . . 2011 120609 9003 1 7000 0 0 . . 2012 120609 9003 0 0 1 1 1 7000 2013 120609 9003 1 5000 1 2 . . 2012 247908 23001 1 9000 0 0 . . 2013 247908 23001 1 8000 0 0 1 9000 2014 247908 23001 1 8500 1 1 1 8000 2015 247908 23001 0 0 1 2 . . 2003 356123 1001 0 0 0 0 . . 2004 356123 1001 0 0 0 0 0 0 2005 356123 1001 0 0 0 0 0 0 2009 356123 1001 1 9800 1 1 0 0 ; run; The rules are as follows: 1) First, consider only the rows with operate = 0. For each firm_id and location_id pair, if in the following year there is no row with the same firm_id and location_id, then create a new row with the following year and same firm_id and location_id pair. The variables action, action_amount, operate, and new_entry are all set to 0, while pre_action and pre_action_amount is set to be the value of action and action_amount in the previous year. Example: In year 2013, for the firm_id/location_id pair 28013/6085, we have operate = 0. But in 2014, there are no observations for this firm_id/location_id pair. So we set action, action_amount, operate, and new_entry to be 0 and pre_action=1 and pre_action_amount=10000 which are the values for action and action_amount in 2013. For each firm_id and location_id pair, if in the following year there is a row with the same firm_id and location_id, then simply set pre_action and pre_action_amount to be the value of action and action_amount in the previous year. Example: In year 2011 for firm_id/location_id 120609/9003, we have operate=0. But in the next year 2012, there is a row with this firm_id/location_id pair. So we set pre_action=1 and pre_action_amount=7000 which are the values for action and action_amount in 2011. Another example is in year 2003, for the firm_id/location_id 356123/1001. 2) Now consider the rows with new_entry=1 that do not yet have a value of pre_action and pre_action_amount. Set both pre_action and pre_action_amount to be 0. 3) All other values of pre_action and pre_action_amount are empty.

Online Status	Offline
Date Last Visited	‎10-24-2023 05:05 AM

How to choose different random sample sizes using proc surveyselect?

How to transpose this table?

Re: Creating a timeseries based on information in a dataset

Creating a timeseries based on information in a dataset

Re: How to find a given word in a string

Re: How to find a given word in a string

Re: How to find a given word in a string

How to find a given word in a string

Creating a variable that counts consecutive years

Re: Forward and backward sums

Re: How to transpose and rename variables

How to find a given word in a string

Replacing values and outputting data as separate csv file

Assign group numbers to nearby ID's

Replacing values and outputting data as separate csv file

Changing subsequent values based on previous value

Re: Count number of 0's until next 1

Count number of 0's until next 1

Re: How to transpose and rename variables

How to transpose and rename variables

How to create new columns based on another dataset

How to test for significance between difference of difference in means

Choosing a lagged value of a variable each time it "updates"

Matching many to one based on scores

Using information from one dataset to create another dataset

Creating a new variable based on another column

Re: Adding additional rows based on another row

Inserting rows based on conditions from another row

Inserting additional rows based on other rows