Hello All,
I am trying to create a list where a certain string is deleted from a string list.
for example: considering the third row in the table below where year=2005,2007, the value 2005,2007 should be deleted from year_list and with end result being "2006,2008,2009,2010,2011"
The original dataset looks like this:
year | year_list |
2005 | 2005,2006,2007,2008,2009,2010,2011 |
2005,2007 | 2005,2006,2007,2008,2009,2010,2011 |
2008,2011 | 2005,2006,2007,2008,2009,2010,2011 |
proc sql; create table have (year VARCHAR(1000), year_list VARCHAR(1000)); insert into have values('2005','2005,2006,2007,2008,2009,2010,2011'); insert into have values('2005,2007' '2005,2006,2007,2008,2009,2010,2011'); insert into have values('2008,2011' '2005,2006,2007,2008,2009,2010,2011'); run;
The "want" variable is what i require as output - as shown in the following table (with code below):
year | year_list | want |
2005 | 2005,2006,2007,2008,2009,2010,2011 | 2006,2007,2008,2009,2010,2011 |
2005,2007 | 2005,2006,2007,2008,2009,2010,2011 | 2006,2008,2009,2010,2012 |
2008,2011 | 2005,2006,2007,2008,2009,2010,2011 | 2005,2006,2007,2009,2010 |
proc sql; create table want (year VARCHAR(1000), year_list VARCHAR(1000), want_list VARCHAR(1000)); insert into want values('2005','2005,2006,2007,2008,2009,2010,2011','2006,2007,2008,2009,2010,2011'); insert into want values('2005,2007' '2005,2006,2007,2008,2009,2010,2011','2006,2008,2009,2010,2012'); insert into want values('2008,2011' '2005,2006,2007,2008,2009,2010,2011','2005,2006,2007,2009,2010'); run;
Any help would be appreciated.
Thank you.
data want;
set have;
format want_list $1000.;
nyear=countw(year,',');
do i=1 to nyear;
rmyear=scan(year,i,',');
want_list=prxchange(cats("s/",rmyear,",*//"),-1,year_list);
/* Remove last commas */
want_list=prxchange(cats("s/, *$//"),-1,want_list);
end;
drop i rmyear nyear;
run;
data want;
set have;
format want_list $1000.;
nyear=countw(year,',');
do i=1 to nyear;
rmyear=scan(year,i,',');
want_list=prxchange(cats("s/",rmyear,",*//"),-1,year_list);
/* Remove last commas */
want_list=prxchange(cats("s/, *$//"),-1,want_list);
end;
drop i rmyear nyear;
run;
Whilst you have a solution given, I would question the logic of having a series of data items, comma separated, in one variable. You are fighting the basic strcuture of SAS data doing so. SAS strcuture is one column per data item, and procedures and functions are based on this principal. Any furhter processing you do on that string will involve far more processing than is necessary. I would advise you to re-think it, even at worse case scenario, just have a bunch of varx - vary variables:
YEAR VAR1 VAR2 VAR3 ...
2005 2005 2006 2007
Think about how much simpler it would be being able to use array processing on the above using of var:
And if you want a string somewhere in the line then:
catx(",",of var:);
Will do the trick no problem. Make your likfe easier, that is what us lazy programmers do
Hello RW9,
Thank you for your suggestion. I followed what you mentioned and split the data into different variables (coded below)
However, I dont know how to approach this problem with listing as mentioned earlier.
proc sql; create table have (year VARCHAR(100), year_list VARCHAR(100)); insert into have values('2005','2005,2006,2007,2008,2009,2010,2011'); insert into have values('2005,2007','2005,2006,2007,2008,2009,2010,2011'); insert into have values('2007,2011','2005,2006,2007,2008,2009,2010,2011'); run; data tmp1; set have; csv=countw(year,","); codgen_csv=countw(year_list,","); run; proc sql ; select distinct cats("mart",max(csv)), max(csv) into :arraylist,:arraycnt from tmp1; select distinct cats("dlv",max(codgen_csv)), max(codgen_csv) into :dlv_array, :dlvcnt from tmp1; run; data want (drop=list); set tmp1; length mart1-&arraylist. $100.; length dlv1-&dlv_array. $100.; array mart(&arraycnt.) $; array dlv(&dlvcnt.) $; do _i=1 to dim(mart); mart[_i]=scan(year,_i,','); end; do _j=1 to dim(dlv); dlv[_j]=scan(year_list,_j,','); end; run;
Also, i forgot to mention this in the previous message that the lower bound of the dataset changes: i.e. the last row of the following table, "want" column, notes the first observation as 2009, as variable year begins with 2008. In other words, the first variable mentioned in year acts as the starting point. The end point the last year mentioned in year_list.
The final result should like this:
year | year_list | want |
2005 | 2005,2006,2007,2008,2009,2010,2011 | 2006,2007,2008,2009,2010,2011 |
2005,2007 | 2005,2006,2007,2008,2009,2010,2011 | 2006,2008,2009,2010,2012 |
2008,2011 | 2005,2006,2007,2008,2009,2010,2011 | 2009,2010 |
Please do let me know.
Thank you for your time.
Hi,
It is unclearn, at leat to me, what You are trying to achieve.
As RW9 said, it could be useful to take some time to think about
which form should take your data to make your work simpler.
Since we are dealing with years, Yu'd probably prefer using
numeric variables.
For instance, year_list seems identical for all observations and contains
consecutive years. So all you need is the first and the last year.
You can store them in macro-variables to avoid repeating them on each observation.
Does the following code answer your problem ?
proc sql;
create table have (year VARCHAR(100), year_list VARCHAR(100));
insert into have values('2005','2005,2006,2007,2008,2009,2010,2011');
insert into have values('2005,2007','2005,2006,2007,2008,2009,2010,2011');
insert into have values('2007,2011','2005,2006,2007,2008,2009,2010,2011');
run;
%let period_start=2005;
%let period_end=2011;
data do_you_want_this;
set have;
format start end 4.
want $1000.;
start = max(put(scan(year,1,','),4.)+1,&period_start.);
end = &period_end.-1;
exclude = put(scan(year,2,','),4.);
do i=start to end;
if i ne exclude then want=catx(',',want,i);
end;
drop exclude i;
run;
True. However, the complexity here is that the years mentioned here is not in numeric format, but in character form. i.e. the year could have a suffix/ prefix . for example: M2010_SA, M8_2009.
Unfortunately, this is the data i receive, and although i would like to change the data composition to match the year format, this would be a huge hassle as it would require changing many more parameters in many scripts.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.