Hello All,
I am trying to create a list where a certain string is deleted from a string list.
for example: considering the third row in the table below where year=2007M,2011M, the value 2007M,2011M should be deleted from year_list and with end result displayed in variable "want" being "2008,2009,2010". for this same example, the start should be 2007M, as that is the first observation in variable "year".
THe FINAL table that i want should be as follows:
year |
year_list |
want |
2005M |
2005M,2006,2007M,2008,2009,2010,2011M |
2006,2007M,2008,2009,2010,2011M |
2005M,2007M |
2005M,2006,2007M,2008,2009,2010,2011M |
2006,2008,2009,2010,2011M |
2007M,2011M |
2005M,2006,2007M,2008,2009,2010,2011M |
2008,2009,2010 |
I tried arrays to solve this problem, but not able to get anywhere close to the first step.
I list below my code:
proc sql;
create table have (year VARCHAR(100), year_list VARCHAR(100));
insert into have values('2005M','2005M,2006,2007M,2008,2009,2010,2011M');
insert into have values('2005M,2007M','2005M,2006,2007M,2008,2009,2010,2011M');
insert into have values('2007M,2011M','2005M,2006,2007M,2008,2009,2010,2011M');
run;
data trial (keep=x);
set have(obs=1);
cnt=countw(year_list,",");
i=0;
do while (i<cnt);
i+1;
x=cats("dlv",scan(strip(year_list),i,","));
output;
end;
run;
proc sql ;
select distinct cats("mart",max(countw(year,","))), max(countw(year,",")) into :arraylist,:arraycnt from have;
select distinct cats("dlv",max(countw(year_list,","))), max(countw(year_list,",")) into :dlv_array, :dlvcnt from have;
select distinct x into :array_list separated by ' ' from trial;
run;
data want ;
set have;
length &array_list. $100.;
array dlv(&dlvcnt.) $ &array_list.;
do _j=1 to dim(dlv);
if index(upcase(vname(dlv(_j))),upcase(cats("DLV",dlv[_j]))) then do;
dlv[_j]=scan(year,_j,',');
end;
end;
run;
The array code in the last data step does not display the result i want for the first step of arriving to the final table. The table should look more like:
year | year_list | dlv2005M | dlv2006 | dlv2007M | dlv2008 | dlv2009 | dlv2010 | dlv2011M |
2005M | 2005M,2006,2007M,2008,2009,2010,2011M | 2005M | ||||||
2005M,2007M | 2005M,2006,2007M,2008,2009,2010,2011M | 2005M | 2007M | |||||
2007M,2011M | 2005M,2006,2007M,2008,2009,2010,2011M | 2007M | 2011M |
After getting this table, I look at finding all missing values in the table and listing these out.
Still left with the dilemma of how to determine the first observation 😞
Any pointers, help, tip would be appreciated.
thanking you
Hi,
I am sure you have posted this question in the last few weeks, no? Anyways, as always my first question is going to be why? SAS is built around the construct of datasets containing variables which hold One! item per variable. Your trying to fit many data items into one variable, this is just going to cause you headaches through programming. I would suggest the following, normalise your data, have one row per id value. You can then easily apply any programming to that dataset and then if for some reason you need the final, transpose it up again when you have finished - remember the data you program with does not need to look like any out format.
However, that being said, you can process it as is:
data have; length year year_list $200; input year $ year_list $; datalines; 2005M 2005M,2006,2007M,2008,2009,2010,2011M 2005M,2007M 2005M,2006,2007M,2008,2009,2010,2011M ; run; data want; length want $200; set have; do i=1 to countw(year_list,","); if index(year,scan(year_list,i,","))=0 then want=catx(",",want,scan(year_list,i,",")); end; run;
Hello All,
I am trying to create a list where a certain string is deleted from a string list.
I adopted the array approach as in the code for the following reasons:
1) The year and year_list is in character format and comma separated. the year and year_list values are comma separated and sorted.
2) Variable "want" lists all variables from "year_list2 that are not present in variable "year", however ensuring that the starting point is determined by the first value from variable "year" in "year_list" - as seen in the last row.
THe table that i want should be as follows:
year |
year_list |
want |
2005M |
2005M,2006,2007M,2008,2009,2010,2011M |
2006,2007M,2008,2009,2010,2011M |
2005M,2007M |
2005M,2006,2007M,2008,2009,2010,2011M |
2006,2008,2009,2010,2011M |
2007M,2011M |
2005M,2006,2007M,2008,2009,2010,2011M |
2008,2009,2010 |
proc sql;
create table have (year VARCHAR(100), year_list VARCHAR(100));
insert into have values('2005M','2005M,2006,2007M,2008,2009,2010,2011M');
insert into have values('2005M,2007M','2005M,2006,2007M,2008,2009,2010,2011M');
insert into have values('2007M,2011M','2005M,2006,2007M,2008,2009,2010,2011M');
run;
data tmp1;
set have;
csv=countw(year,",");
codgen_csv=countw(year_list,",");
run;
proc sql ;
select distinct cats("mart",max(csv)), max(csv) into :arraylist,:arraycnt from tmp1;
select distinct cats("dlv",max(codgen_csv)), max(codgen_csv) into :dlv_array, :dlvcnt from tmp1;
run;
data want (drop=list);
set tmp1;
length mart1-&arraylist. $100.;
length dlv1-&dlv_array. $100.;
array mart(&arraycnt.) $;
array dlv(&dlvcnt.) $;
do _i=1 to dim(mart);
mart[_i]=scan(year,_i,',');
end;
do _j=1 to dim(dlv);
dlv[_j]=scan(year_list,_j,',');
end;
run;
I am really struggling with this. And i don't know how to approach this problem.
Any pointers/ help would be appreciated.
Hi,
I am sure you have posted this question in the last few weeks, no? Anyways, as always my first question is going to be why? SAS is built around the construct of datasets containing variables which hold One! item per variable. Your trying to fit many data items into one variable, this is just going to cause you headaches through programming. I would suggest the following, normalise your data, have one row per id value. You can then easily apply any programming to that dataset and then if for some reason you need the final, transpose it up again when you have finished - remember the data you program with does not need to look like any out format.
However, that being said, you can process it as is:
data have; length year year_list $200; input year $ year_list $; datalines; 2005M 2005M,2006,2007M,2008,2009,2010,2011M 2005M,2007M 2005M,2006,2007M,2008,2009,2010,2011M ; run; data want; length want $200; set have; do i=1 to countw(year_list,","); if index(year,scan(year_list,i,","))=0 then want=catx(",",want,scan(year_list,i,",")); end; run;
Sorry for the pestering and double emails. I am not pleased with the format structure that i presented in the post.THis is data that i receive from third party and a request to deliver in a better format is something they cannot do 😞
Sir RW9, you are a genius.
For anyone having a similar query, slightly modified solution below: to produce desired outcome
data want (drop=i);
length want $200;
set have;
do i=1 to countw(year_list,",");
if index(year,scan(substr(year_list,max(index(year_list,scan(year,1,",")),1)),i,","))=0 then
want=catx(",",want,scan(substr(year_list,max(index(year_list,scan(year,1,",")),1)),i,","));
end;
run;
thank you
data have; infile cards dlm=' '; input year :$20. year_list : $100. ; cards; 2005M 2005M,2006,2007M,2008,2009,2010,2011M 2005M,2007M 2005M,2006,2007M,2008,2009,2010,2011M 2007M,2011M 2005M,2006,2007M,2008,2009,2010,2011M ; run; data want; set have; length want $ 200; first=scan(year,1,','); s=findw(year_list,strip(first)); temp=substr(year_list,s); do i=1 to countw(temp,','); t=scan(temp,i,','); if not findw(year,strip(t)) then want=catx(',',want,t); end; drop t i s temp first ; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.