BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sebster24
Quartz | Level 8

Hello All,

I am trying to create a list where a certain string is deleted from a string list.

for example: considering the third row in the table below where year=2005,2007, the value 2005,2007 should be deleted from year_list and with end result being "2006,2008,2009,2010,2011" 

The original dataset looks like this:

year year_list
2005 2005,2006,2007,2008,2009,2010,2011
2005,2007 2005,2006,2007,2008,2009,2010,2011
2008,2011 2005,2006,2007,2008,2009,2010,2011
proc sql;
create table have (year VARCHAR(1000), year_list VARCHAR(1000));
insert into have values('2005','2005,2006,2007,2008,2009,2010,2011');
insert into have values('2005,2007' '2005,2006,2007,2008,2009,2010,2011');
insert into have values('2008,2011' '2005,2006,2007,2008,2009,2010,2011');
run;

 

The "want" variable is what i require as output - as shown in the following table (with code below):

 

year year_list want
2005 2005,2006,2007,2008,2009,2010,2011 2006,2007,2008,2009,2010,2011
2005,2007 2005,2006,2007,2008,2009,2010,2011 2006,2008,2009,2010,2012
2008,2011 2005,2006,2007,2008,2009,2010,2011 2005,2006,2007,2009,2010
proc sql;
create table want (year VARCHAR(1000), year_list VARCHAR(1000), want_list VARCHAR(1000));
insert into want values('2005','2005,2006,2007,2008,2009,2010,2011','2006,2007,2008,2009,2010,2011');
insert into want values('2005,2007' '2005,2006,2007,2008,2009,2010,2011','2006,2008,2009,2010,2012');
insert into want values('2008,2011' '2005,2006,2007,2008,2009,2010,2011','2005,2006,2007,2009,2010');
run;

Any help would be appreciated.

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
gamotte
Rhodochrosite | Level 12

data want;
set have;
format want_list $1000.;
nyear=countw(year,',');
do i=1 to nyear;
    rmyear=scan(year,i,',');
    want_list=prxchange(cats("s/",rmyear,",*//"),-1,year_list);
    /* Remove last commas */
    want_list=prxchange(cats("s/, *$//"),-1,want_list);
end;

drop i rmyear nyear;
run;

View solution in original post

5 REPLIES 5
gamotte
Rhodochrosite | Level 12

data want;
set have;
format want_list $1000.;
nyear=countw(year,',');
do i=1 to nyear;
    rmyear=scan(year,i,',');
    want_list=prxchange(cats("s/",rmyear,",*//"),-1,year_list);
    /* Remove last commas */
    want_list=prxchange(cats("s/, *$//"),-1,want_list);
end;

drop i rmyear nyear;
run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Whilst you have a solution given, I would question the logic of having a series of data items, comma separated, in one variable.  You are fighting the basic strcuture of SAS data doing so.  SAS strcuture is one column per data item, and procedures and functions are based on this principal.  Any furhter processing you do on that string will involve far more processing than is necessary.  I would advise you to re-think it, even at worse case scenario, just have a bunch of varx - vary variables:

YEAR     VAR1    VAR2   VAR3 ...

2005       2005     2006   2007

 

Think about how much simpler it would be being able to use array processing on the above using of var:

And if you want a string somewhere in the line then:

catx(",",of var:);

Will do the trick no problem.  Make your likfe easier, that is what us lazy programmers do Smiley Happy

sebster24
Quartz | Level 8

Hello RW9,

Thank you for your suggestion. I followed what you mentioned and split the data into different variables (coded below)

However, I dont know how to approach this problem with listing as mentioned earlier.

proc sql;
create table have (year VARCHAR(100), year_list VARCHAR(100));
insert into have values('2005','2005,2006,2007,2008,2009,2010,2011');
insert into have values('2005,2007','2005,2006,2007,2008,2009,2010,2011');
insert into have values('2007,2011','2005,2006,2007,2008,2009,2010,2011');
run;

data tmp1;
set have;
csv=countw(year,",");
codgen_csv=countw(year_list,",");
run;


proc sql ;
select distinct cats("mart",max(csv)), max(csv) into :arraylist,:arraycnt from tmp1;
select distinct cats("dlv",max(codgen_csv)), max(codgen_csv) into :dlv_array, :dlvcnt from tmp1;
run;

data want (drop=list);
set tmp1;
length mart1-&arraylist. $100.;
length dlv1-&dlv_array. $100.;
array mart(&arraycnt.) $;
array dlv(&dlvcnt.) $;
do _i=1 to dim(mart);
mart[_i]=scan(year,_i,',');
end;
do _j=1 to dim(dlv);
dlv[_j]=scan(year_list,_j,',');
end;
run;

 

 Also, i forgot to mention this in the previous message that the lower bound of the dataset changes: i.e. the last row of the following table, "want" column, notes the first observation as 2009, as variable year begins with 2008. In other words, the first variable mentioned in year acts as the starting point. The end point the last year mentioned in year_list.

The final result should like this:

year year_list want
2005 2005,2006,2007,2008,2009,2010,2011 2006,2007,2008,2009,2010,2011
2005,2007 2005,2006,2007,2008,2009,2010,2011 2006,2008,2009,2010,2012
2008,2011 2005,2006,2007,2008,2009,2010,2011 2009,2010

Please do let me know.

Thank you for your time.

gamotte
Rhodochrosite | Level 12

Hi,

 

It is unclearn, at leat to me, what You are trying to achieve.

As RW9 said, it could be useful to take some time to think about

which form should take your data to make your work simpler.

Since we are dealing with years, Yu'd probably prefer using

numeric variables.

 

For instance, year_list seems identical for all observations and contains

consecutive years. So all you need is the first and the last year.

You can store them in macro-variables to avoid repeating them on each observation.

 

Does the following code answer your problem ?

 

proc sql;
    create table have (year VARCHAR(100), year_list VARCHAR(100));
    insert into have values('2005','2005,2006,2007,2008,2009,2010,2011');
    insert into have values('2005,2007','2005,2006,2007,2008,2009,2010,2011');
    insert into have values('2007,2011','2005,2006,2007,2008,2009,2010,2011');
run;

%let period_start=2005;
%let period_end=2011;

data do_you_want_this;
    set have;
    format start end 4.
           want $1000.;
    start = max(put(scan(year,1,','),4.)+1,&period_start.);
    end = &period_end.-1;
    exclude = put(scan(year,2,','),4.);

    do i=start to end;
        if i ne exclude then want=catx(',',want,i);
    end;

    drop exclude i;
run;

sebster24
Quartz | Level 8

True. However, the complexity here is that the years mentioned here is not in numeric format, but in character form. i.e. the year could have a suffix/ prefix . for example: M2010_SA, M8_2009.

Unfortunately, this is the data i receive, and although i would like to change the data composition to match the year format, this would be a huge hassle as it would require changing many more parameters in many scripts. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1867 views
  • 5 likes
  • 3 in conversation