Solved: finding missing values from a list

sebster24 · Posted 04-25-2016 07:10 AM

Hello All,

I am trying to create a list where a certain string is deleted from a string list.

for example: considering the third row in the table below where year=2007M,2011M, the value 2007M,2011M should be deleted from year_list and with end result displayed in variable "want" being "2008,2009,2010". for this same example, the start should be 2007M, as that is the first observation in variable "year".

THe FINAL table that i want should be as follows:

year	year_list	want
2005M	2005M,2006,2007M,2008,2009,2010,2011M	2006,2007M,2008,2009,2010,2011M
2005M,2007M	2005M,2006,2007M,2008,2009,2010,2011M	2006,2008,2009,2010,2011M
2007M,2011M	2005M,2006,2007M,2008,2009,2010,2011M	2008,2009,2010

I tried arrays to solve this problem, but not able to get anywhere close to the first step.

I list below my code:

proc sql;
create table have (year VARCHAR(100), year_list VARCHAR(100));
insert into have values('2005M','2005M,2006,2007M,2008,2009,2010,2011M');
insert into have values('2005M,2007M','2005M,2006,2007M,2008,2009,2010,2011M');
insert into have values('2007M,2011M','2005M,2006,2007M,2008,2009,2010,2011M');
run;

data trial (keep=x);
set have(obs=1);
cnt=countw(year_list,",");
i=0;
do while (i<cnt);
i+1;
x=cats("dlv",scan(strip(year_list),i,","));
output;
end;
run;

proc sql ;
select distinct cats("mart",max(countw(year,","))), max(countw(year,",")) into :arraylist,:arraycnt from have;
select distinct cats("dlv",max(countw(year_list,","))), max(countw(year_list,",")) into :dlv_array, :dlvcnt from have;
select distinct x into :array_list separated by ' ' from trial;
run;

data want ;
set have;
length &array_list. $100.;
array dlv(&dlvcnt.) $ &array_list.;
do _j=1 to dim(dlv);
if index(upcase(vname(dlv(_j))),upcase(cats("DLV",dlv[_j]))) then do;
dlv[_j]=scan(year,_j,',');
end;
end;
run;

The array code in the last data step does not display the result i want for the first step of arriving to the final table. The table should look more like:

year	year_list	dlv2005M	dlv2006	dlv2007M	dlv2008	dlv2009	dlv2010	dlv2011M
2005M	2005M,2006,2007M,2008,2009,2010,2011M	2005M
2005M,2007M	2005M,2006,2007M,2008,2009,2010,2011M	2005M		2007M
2007M,2011M	2005M,2006,2007M,2008,2009,2010,2011M			2007M				2011M

After getting this table, I look at finding all missing values in the table and listing these out.

Still left with the dilemma of how to determine the first observation 😞

Any pointers, help, tip would be appreciated.

thanking you

RW9 · Posted 04-25-2016 07:54 AM

Hi,

I am sure you have posted this question in the last few weeks, no? Anyways, as always my first question is going to be why? SAS is built around the construct of datasets containing variables which hold One! item per variable. Your trying to fit many data items into one variable, this is just going to cause you headaches through programming. I would suggest the following, normalise your data, have one row per id value. You can then easily apply any programming to that dataset and then if for some reason you need the final, transpose it up again when you have finished - remember the data you program with does not need to look like any out format.

However, that being said, you can process it as is:

data have;
  length year year_list $200;
  input year $ year_list $;
datalines;
2005M 2005M,2006,2007M,2008,2009,2010,2011M
2005M,2007M 2005M,2006,2007M,2008,2009,2010,2011M
;
run;
data want;
  length want $200;
  set have;
  do i=1 to countw(year_list,",");
    if index(year,scan(year_list,i,","))=0 then want=catx(",",want,scan(year_list,i,","));
  end;
run;

View solution in original post

sebster24 · Posted 04-25-2016 06:17 AM

Hello All,

I am trying to create a list where a certain string is deleted from a string list.

I adopted the array approach as in the code for the following reasons:

1) The year and year_list is in character format and comma separated. the year and year_list values are comma separated and sorted.

2) Variable "want" lists all variables from "year_list2 that are not present in variable "year", however ensuring that the starting point is determined by the first value from variable "year" in "year_list" - as seen in the last row.

THe table that i want should be as follows:

year	year_list	want
2005M	2005M,2006,2007M,2008,2009,2010,2011M	2006,2007M,2008,2009,2010,2011M
2005M,2007M	2005M,2006,2007M,2008,2009,2010,2011M	2006,2008,2009,2010,2011M
2007M,2011M	2005M,2006,2007M,2008,2009,2010,2011M	2008,2009,2010

proc sql;
create table have (year VARCHAR(100), year_list VARCHAR(100));
insert into have values('2005M','2005M,2006,2007M,2008,2009,2010,2011M');
insert into have values('2005M,2007M','2005M,2006,2007M,2008,2009,2010,2011M');
insert into have values('2007M,2011M','2005M,2006,2007M,2008,2009,2010,2011M');
run;

data tmp1;
set have;
csv=countw(year,",");
codgen_csv=countw(year_list,",");
run;


proc sql ;
select distinct cats("mart",max(csv)), max(csv) into :arraylist,:arraycnt from tmp1;
select distinct cats("dlv",max(codgen_csv)), max(codgen_csv) into :dlv_array, :dlvcnt from tmp1;
run;

data want (drop=list);
set tmp1;
length mart1-&arraylist. $100.;
length dlv1-&dlv_array. $100.;
array mart(&arraycnt.) $;
array dlv(&dlvcnt.) $;
do _i=1 to dim(mart);
mart[_i]=scan(year,_i,',');
end;
do _j=1 to dim(dlv);
dlv[_j]=scan(year_list,_j,',');
end;
run;

I am really struggling with this. And i don't know how to approach this problem.

Any pointers/ help would be appreciated.

RW9 · Posted 04-25-2016 07:54 AM

Hi,

I am sure you have posted this question in the last few weeks, no? Anyways, as always my first question is going to be why? SAS is built around the construct of datasets containing variables which hold One! item per variable. Your trying to fit many data items into one variable, this is just going to cause you headaches through programming. I would suggest the following, normalise your data, have one row per id value. You can then easily apply any programming to that dataset and then if for some reason you need the final, transpose it up again when you have finished - remember the data you program with does not need to look like any out format.

However, that being said, you can process it as is:

data have;
  length year year_list $200;
  input year $ year_list $;
datalines;
2005M 2005M,2006,2007M,2008,2009,2010,2011M
2005M,2007M 2005M,2006,2007M,2008,2009,2010,2011M
;
run;
data want;
  length want $200;
  set have;
  do i=1 to countw(year_list,",");
    if index(year,scan(year_list,i,","))=0 then want=catx(",",want,scan(year_list,i,","));
  end;
run;

sebster24 · Posted 04-25-2016 08:46 AM

Sorry for the pestering and double emails. I am not pleased with the format structure that i presented in the post.THis is data that i receive from third party and a request to deliver in a better format is something they cannot do 😞

Sir RW9, you are a genius.

For anyone having a similar query, slightly modified solution below: to produce desired outcome


data want (drop=i);
	length want $200;
	set have;
	do i=1 to countw(year_list,",");
		if index(year,scan(substr(year_list,max(index(year_list,scan(year,1,",")),1)),i,","))=0 then
			want=catx(",",want,scan(substr(year_list,max(index(year_list,scan(year,1,",")),1)),i,","));
	end;
run;

thank you

Ksharp · Posted 04-25-2016 11:09 PM

I am not sure I understand what is your mean completely .


data have;
infile cards dlm=' ';
input year :$20. year_list : $100. ;
cards;
2005M 2005M,2006,2007M,2008,2009,2010,2011M
2005M,2007M 2005M,2006,2007M,2008,2009,2010,2011M
2007M,2011M 2005M,2006,2007M,2008,2009,2010,2011M
;
run;
data want;
 set have;
 length want $ 200;
 first=scan(year,1,',');
 s=findw(year_list,strip(first));
 temp=substr(year_list,s);
 do i=1 to countw(temp,',');
  t=scan(temp,i,',');
  if not findw(year,strip(t)) then want=catx(',',want,t);
 end;
 drop t i s temp first ;
run;

finding missing values from a list

Re: finding missing values from a list

compare values and note missing values

Re: finding missing values from a list

Re: finding missing values from a list

Re: finding missing values from a list

finding missing values from a list

Re: finding missing values from a list

compare values and note missing values

Re: finding missing values from a list

Re: finding missing values from a list

Re: finding missing values from a list

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!