DATA Step, Macro, Functions and more

Delete a string (in variable) from a character string(in variable)

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

Delete a string (in variable) from a character string(in variable)

Hello All,

I am trying to create a list where a certain string is deleted from a string list.

for example: considering the third row in the table below where year=2005,2007, the value 2005,2007 should be deleted from year_list and with end result being "2006,2008,2009,2010,2011" 

The original dataset looks like this:

year year_list
2005 2005,2006,2007,2008,2009,2010,2011
2005,2007 2005,2006,2007,2008,2009,2010,2011
2008,2011 2005,2006,2007,2008,2009,2010,2011
proc sql;
create table have (year VARCHAR(1000), year_list VARCHAR(1000));
insert into have values('2005','2005,2006,2007,2008,2009,2010,2011');
insert into have values('2005,2007' '2005,2006,2007,2008,2009,2010,2011');
insert into have values('2008,2011' '2005,2006,2007,2008,2009,2010,2011');
run;

 

The "want" variable is what i require as output - as shown in the following table (with code below):

 

year year_list want
2005 2005,2006,2007,2008,2009,2010,2011 2006,2007,2008,2009,2010,2011
2005,2007 2005,2006,2007,2008,2009,2010,2011 2006,2008,2009,2010,2012
2008,2011 2005,2006,2007,2008,2009,2010,2011 2005,2006,2007,2009,2010
proc sql;
create table want (year VARCHAR(1000), year_list VARCHAR(1000), want_list VARCHAR(1000));
insert into want values('2005','2005,2006,2007,2008,2009,2010,2011','2006,2007,2008,2009,2010,2011');
insert into want values('2005,2007' '2005,2006,2007,2008,2009,2010,2011','2006,2008,2009,2010,2012');
insert into want values('2008,2011' '2005,2006,2007,2008,2009,2010,2011','2005,2006,2007,2009,2010');
run;

Any help would be appreciated.

Thank you.


Accepted Solutions
Solution
‎04-20-2016 08:06 AM
Regular Contributor
Posts: 233

Re: Delete a string (in variable) from a character string(in variable)

Posted in reply to sebster24

data want;
set have;
format want_list $1000.;
nyear=countw(year,',');
do i=1 to nyear;
    rmyear=scan(year,i,',');
    want_list=prxchange(cats("s/",rmyear,",*//"),-1,year_list);
    /* Remove last commas */
    want_list=prxchange(cats("s/, *$//"),-1,want_list);
end;

drop i rmyear nyear;
run;

View solution in original post


All Replies
Solution
‎04-20-2016 08:06 AM
Regular Contributor
Posts: 233

Re: Delete a string (in variable) from a character string(in variable)

Posted in reply to sebster24

data want;
set have;
format want_list $1000.;
nyear=countw(year,',');
do i=1 to nyear;
    rmyear=scan(year,i,',');
    want_list=prxchange(cats("s/",rmyear,",*//"),-1,year_list);
    /* Remove last commas */
    want_list=prxchange(cats("s/, *$//"),-1,want_list);
end;

drop i rmyear nyear;
run;

Super User
Super User
Posts: 7,942

Re: Delete a string (in variable) from a character string(in variable)

Posted in reply to sebster24

Whilst you have a solution given, I would question the logic of having a series of data items, comma separated, in one variable.  You are fighting the basic strcuture of SAS data doing so.  SAS strcuture is one column per data item, and procedures and functions are based on this principal.  Any furhter processing you do on that string will involve far more processing than is necessary.  I would advise you to re-think it, even at worse case scenario, just have a bunch of varx - vary variables:

YEAR     VAR1    VAR2   VAR3 ...

2005       2005     2006   2007

 

Think about how much simpler it would be being able to use array processing on the above using of var:

And if you want a string somewhere in the line then:

catx(",",of varSmiley Happy;

Will do the trick no problem.  Make your likfe easier, that is what us lazy programmers do Smiley Happy

Contributor
Posts: 44

Re: Delete a string (in variable) from a character string(in variable)

Hello RW9,

Thank you for your suggestion. I followed what you mentioned and split the data into different variables (coded below)

However, I dont know how to approach this problem with listing as mentioned earlier.

proc sql;
create table have (year VARCHAR(100), year_list VARCHAR(100));
insert into have values('2005','2005,2006,2007,2008,2009,2010,2011');
insert into have values('2005,2007','2005,2006,2007,2008,2009,2010,2011');
insert into have values('2007,2011','2005,2006,2007,2008,2009,2010,2011');
run;

data tmp1;
set have;
csv=countw(year,",");
codgen_csv=countw(year_list,",");
run;


proc sql ;
select distinct cats("mart",max(csv)), max(csv) into :arraylist,:arraycnt from tmp1;
select distinct cats("dlv",max(codgen_csv)), max(codgen_csv) into :dlv_array, :dlvcnt from tmp1;
run;

data want (drop=list);
set tmp1;
length mart1-&arraylist. $100.;
length dlv1-&dlv_array. $100.;
array mart(&arraycnt.) $;
array dlv(&dlvcnt.) $;
do _i=1 to dim(mart);
mart[_i]=scan(year,_i,',');
end;
do _j=1 to dim(dlv);
dlv[_j]=scan(year_list,_j,',');
end;
run;

 

 Also, i forgot to mention this in the previous message that the lower bound of the dataset changes: i.e. the last row of the following table, "want" column, notes the first observation as 2009, as variable year begins with 2008. In other words, the first variable mentioned in year acts as the starting point. The end point the last year mentioned in year_list.

The final result should like this:

year year_list want
2005 2005,2006,2007,2008,2009,2010,2011 2006,2007,2008,2009,2010,2011
2005,2007 2005,2006,2007,2008,2009,2010,2011 2006,2008,2009,2010,2012
2008,2011 2005,2006,2007,2008,2009,2010,2011 2009,2010

Please do let me know.

Thank you for your time.

Regular Contributor
Posts: 233

Re: Delete a string (in variable) from a character string(in variable)

Posted in reply to sebster24

Hi,

 

It is unclearn, at leat to me, what You are trying to achieve.

As RW9 said, it could be useful to take some time to think about

which form should take your data to make your work simpler.

Since we are dealing with years, Yu'd probably prefer using

numeric variables.

 

For instance, year_list seems identical for all observations and contains

consecutive years. So all you need is the first and the last year.

You can store them in macro-variables to avoid repeating them on each observation.

 

Does the following code answer your problem ?

 

proc sql;
    create table have (year VARCHAR(100), year_list VARCHAR(100));
    insert into have values('2005','2005,2006,2007,2008,2009,2010,2011');
    insert into have values('2005,2007','2005,2006,2007,2008,2009,2010,2011');
    insert into have values('2007,2011','2005,2006,2007,2008,2009,2010,2011');
run;

%let period_start=2005;
%let period_end=2011;

data do_you_want_this;
    set have;
    format start end 4.
           want $1000.;
    start = max(put(scan(year,1,','),4.)+1,&period_start.);
    end = &period_end.-1;
    exclude = put(scan(year,2,','),4.);

    do i=start to end;
        if i ne exclude then want=catx(',',want,i);
    end;

    drop exclude i;
run;

Contributor
Posts: 44

Re: Delete a string (in variable) from a character string(in variable)

True. However, the complexity here is that the years mentioned here is not in numeric format, but in character form. i.e. the year could have a suffix/ prefix . for example: M2010_SA, M8_2009.

Unfortunately, this is the data i receive, and although i would like to change the data composition to match the year format, this would be a huge hassle as it would require changing many more parameters in many scripts. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 307 views
  • 5 likes
  • 3 in conversation