SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Data Management

Reply
Occasional Contributor
Posts: 18

Data Management

Hi there

I have the following data set:

NameCodecode dateCode km
0LH681500064370100200112125584
0LH6815063310237002002101724097
0LH6815066100515002002101724097
0LH6815099110475MP2002101724097
0LH6815099600228MP2002101724097
0LH6815064003112002003090444415
0LH6815067110212002003090444415
0LH6815099110054MP2003090444415
0LH6815099210429MP2003090444415
0LH6815061620656002003112448366
0LH6815099510440MP2003112448366
0LH6815099600239MP2003112448366
0LH6815099740535MP2003112448366
0LH6815099114106MP2004012153155
0LH6815099110475MP2004080667368
0LH6815099340069MP2004080667368
0LH6815099600229MP2004120374874
0LH6815099639912MP2004120374874
0LH6815099100121MP2005012478658
0LH6815099639912MP2005012478658
0LH6815011510115002005012578661
0LH6815011510115002005012578661
0LH6815011530048002005012578661
0LH6815099210429MP2005032484101
0LH6815031600039002005072792087
0LH6815099110475MP2005072792087
0LH6815099510249MP2005072792087
0LH6815099639912MP2005072792087
0LH6815061210012002005081192236
0LH6815099600265MP2005100593268
0LH6815021210052002005111195961
0LH6815051210141002005112196296
0LH6815065770815002005112196296
0LH6815065770915002005112196296
0LH68150230003080020060505104569
0LH68150513500390020060505104569
0LH68150621120860020060505104569
0LH68150632505120020060505104569
0LH6815099110037MP20060505104569
0LH6815099639912MP20060505104569
0LH68150611254790020061128114629
0LH6815099600229MP20061128114629
0LH6815099639912MP20061128114629
0LH6815099510249MP20070108117880
0LH68150517200690020070131119229
0LH6815099639912MP20070625125672
0LH68150513312290020070921129551
0LH6815099639912MP20071002129784
0LH68150344100290020080107130255
0LH68150123100120020080116131010
0LH68150344100650020080116131010
0LH68150612100120020080116131010
0LH68150645301480020080116131010
0LH68150632100790020080122131304
0LH6815099110475MP20080208131919
0LH6815099210429MP20080208131919

I would like to select codes that started with (11) and code_km > 30000 , where there was none of the following codes (99600228MP and 99639912MP) done before the code_Km 30000

Super Contributor
Posts: 490

Re: Data Management

data have;

input Name $    Code $10.    code_date: YYMMDD8.    Code_km;

cards;

0LH68150 0064370100 20011212 5584

0LH68150 6331023700 20021017 24097

0LH68150 6610051500 20021017 24097

0LH68150 99110475MP 20021017 24097

0LH68150 99600228MP 20021017 24097

0LH68150 6400311200 20030904 44415

0LH68150 6711021200 20030904 44415

0LH68150 99110054MP 20030904 44415

0LH68150 99210429MP 20030904 44415

0LH68150 6162065600 20031124 48366

0LH68150 99510440MP 20031124 48366

0LH68150 99600239MP 20031124 48366

0LH68150 99740535MP 20031124 48366

0LH68150 99114106MP 20040121 53155

0LH68150 99110475MP 20040806 67368

0LH68150 99340069MP 20040806 67368

0LH68150 99600229MP 20041203 74874

0LH68150 99639912MP 20041203 74874

0LH68150 99100121MP 20050124 78658

0LH68150 99639912MP 20050124 78658

0LH68150 1151011500 20050125 78661

0LH68150 1151011500 20050125 78661

0LH68150 1153004800 20050125 78661

0LH68150 99210429MP 20050324 84101

0LH68150 3160003900 20050727 92087

0LH68150 99110475MP 20050727 92087

0LH68150 99510249MP 20050727 92087

0LH68150 99639912MP 20050727 92087

0LH68150 6121001200 20050811 92236

0LH68150 99600265MP 20051005 93268

0LH68150 2121005200 20051111 95961

0LH68150 5121014100 20051121 96296

0LH68150 6577081500 20051121 96296

0LH68150 6577091500 20051121 96296

0LH68150 2300030800 20060505 104569

0LH68150 5135003900 20060505 104569

0LH68150 6211208600 20060505 104569

0LH68150 6325051200 20060505 104569

0LH68150 99110037MP 20060505 104569

0LH68150 99639912MP 20060505 104569

0LH68150 6112547900 20061128 114629

0LH68150 99600229MP 20061128 114629

0LH68150 99639912MP 20061128 114629

0LH68150 99510249MP 20070108 117880

0LH68150 5172006900 20070131 119229

0LH68150 99639912MP 20070625 125672

0LH68150 5133122900 20070921 129551

0LH68150 99639912MP 20071002 129784

0LH68150 3441002900 20080107 130255

0LH68150 1231001200 20080116 131010

0LH68150 3441006500 20080116 131010

0LH68150 6121001200 20080116 131010

0LH68150 6453014800 20080116 131010

0LH68150 6321007900 20080122 131304

0LH68150 99110475MP 20080208 131919

0LH68150 99210429MP 20080208 131919

;

data want;

set have;

format pcode $10. ;

retain pcode;

if substr(code,1,2)= "11" and Code_km >30000 and reverse(substr(reverse(pcode),1,2)) ~= "MP" then output;

pcode=code;

run;

Occasional Contributor
Posts: 18

Re: Data Management

hi Mohamed

let me try and explain further:

I have over a million observations with different names.

Let’s try look at this in another way

Each name has many codes, each code has a code_date and code_km.

I want to know all those names where the difference between (11)
code with its code_km  has a difference
of 30000 from another code say 994561258.

I want to keep the name only.

Super User
Super User
Posts: 7,392

Re: Data Management

Well, you've pretty much written it yourself there.  Start by adding in your logic to the dataset, something like the below.  However to use the below you need to look at your data a bit more, as in your example there are three codes with 11 as prefix, in this example they all have the same distance, but maybe not always, how will you handle that?

proc sql;

     create table INTER as

     select     A.*,

                   case  when -11 <= (A.CODE_KM - B.CODE_KM) <= 11 then "Y"

                             else "N" end as RES

     from      HAVE A

     left join (select distinct NAME,CODE_KM from HAVE) B

     on          A.NAME=B.NAME;

quit;

This will give you the dataset you had before, plus a Y/N column.  From that you can proc sort by name nodupkey to get your list (or you can add into the above SQL, I keep it separate to explain).

Occasional Contributor
Posts: 18

Re: Data Management

hi again

where does the 30000 km differnce come in

..

Super User
Super User
Posts: 7,392

Re: Data Management

Sorry, I had put 11:

proc sql;

     create table INTER as

     select     A.*,

                   case  when -3000 <= (A.CODE_KM - B.CODE_KM) <= 3000 then "Y"

                             else "N" end as RES

     from      HAVE A

     left join (select distinct NAME,CODE_KM from HAVE) B

     on          A.NAME=B.NAME;

quit;

Note, have allowed =/- 3000

Occasional Contributor
Posts: 18

Re: Data Management

hi can you explain to me what the second part of the program is doing .. thanks

Super User
Super User
Posts: 7,392

Re: Data Management

Yes, I really should write the code correctly:

proc sql;

     create table INTER as

     select     A.*,

                   case  when -3000 <= (A.CODE_KM - B.CODE_KM) <= 3000 then "Y"

                             else "N" end as RES

     from      HAVE A

     left join (select distinct NAME,CODE_KM from HAVE where substr(CODE,1,2)="11") B

/* Merge on the KM from record with code prefixed by 11, so 78661 is merge onto all records in the example, so the calculated RES can be done. */

     on          A.NAME=B.NAME;

quit;

Occasional Contributor
Posts: 18

Re: Data Management

hi

if i want to compare the mileage og ay just 2 other code sto the 11 defect code..say the km differnce between

99110037MP

9999125481

can anyone assist me.

Occasional Contributor
Posts: 18

Re: Data Management

hi

if i want to compare the mileage og ay just 2 other code sto the 11 defect code..say the km differnce between

99110037MP

9999125481

to the 11 code .. adn keep only the 11 code.

can anyone assist me.

Ask a Question
Discussion stats
  • 9 replies
  • 558 views
  • 0 likes
  • 3 in conversation