BookmarkSubscribeRSS Feed
Marleneek
Calcite | Level 5

I have run into another issue. "Uddannelsescenter Nygård  - 10.klasseskolen" appears in the RECORD column, where it instead should appear in the NAME column with "10. klasse" in the RECORD column. As you can see in the picture and code SAS also generates a new row (748) with "Måløhøj skole - Østerhøj" in the NAME column and "10. klasse" in the RECORD column, which I don't want because it doesn't exist in the file I imported from excel.

I think the issue SAS has is that the ID number is the same for the two schools as well as a number appearing in the NAME column. I want the output to be that

"151013 Måløhøjvej Skole - Østerhøj" runs from 0-9 klasse (grade) followed by a row with "151013 Uddannelsescenter Nygård  - 10.klasseskolen" with "10.klasse" in the RECORD column. 

 Skærmbillede 2018-03-01 kl. 12.12.11.png

 

2010,151012,Skovvejens Skole - Øst,2. klasse,330,0,0,0,30,0,0,150,0,30,0,0,60,30,0,120,30,0,0
2010,151012,Skovvejens Skole - Øst,3. klasse,240,60,0,0,30,0,0,150,0,60,0,0,60,30,0,120,30,0,0
2010,151012,Skovvejens Skole - Øst,4. klasse,180,60,0,0,30,0,0,120,0,60,0,0,30,60,60,90,30,0,0
2010,151012,Skovvejens Skole - Øst,5. klasse,180,90,0,0,30,0,0,120,0,60,0,0,30,30,90,90,30,0,0
2010,151012,Skovvejens Skole - Øst,6. klasse,180,90,0,0,30,0,0,120,0,60,0,0,0,30,120,60,30,0,0
2010,151012,Skovvejens Skole - Øst,7. klasse,180,90,90,90,30,0,0,120,60,0,60,30,0,0,90,60,30,0,0
2010,151012,Skovvejens Skole - Øst,8. klasse,180,90,120,120,0,0,60,120,60,0,60,60,0,0,0,60,30,0,60
2010,151012,Skovvejens Skole - Øst,9. klasse,150,90,120,120,30,0,60,120,90,0,30,30,0,0,0,60,30,0,60
2010,151013,Måløvhøj Skole - Østerhøj,0. klasse,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,750,0
2010,151013,Måløvhøj Skole - Østerhøj,1. klasse,330,0,0,0,60,0,0,150,0,30,0,0,30,30,0,120,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,2. klasse,330,0,0,0,30,0,0,150,0,30,0,0,60,30,0,120,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,3. klasse,240,60,0,0,30,0,0,150,0,60,0,0,60,30,0,120,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,4. klasse,180,60,0,0,30,0,0,120,0,60,0,0,30,60,60,90,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,5. klasse,180,90,0,0,30,0,0,120,0,60,0,0,60,30,90,90,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,6. klasse,180,90,0,0,30,0,0,120,0,60,0,0,0,30,120,60,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,7. klasse,180,90,90,0,30,0,0,120,60,0,60,30,0,0,90,60,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,8. klasse,180,90,120,0,0,0,60,120,60,0,60,60,0,0,0,60,30,0,0
2010,151013,Måløvhøj Skole - Østerhøj,9. klasse,150,90,120,0,30,0,60,120,90,0,30,30,0,0,0,60,30,0,60
2010,153013,Måløvhøj Skole - Østerhøj,Uddannelsescenter Nygård - 10.Klasseskolen,180,120,120,120,0,0,60,120,60,0,0,0,0,0,0,0,20,0,180
2010,153013,Måløvhøj Skole - Østerhøj,10. klasse,180,120,120,120,0,0,60,120,60,0,0,0,0,0,0,0,20,0,180
2010,153018,Brøndbyøster Skole,0. klasse,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,750,0
2010,153018,Brøndbyøster Skole,1. klasse,330,0,0,0,60,0,0,150,0,30,0,0,45,45,0,60,30,0,0
2010,153018,Brøndbyøster Skole,2. klasse,300,0,0,0,30,0,0,150,0,60,0,0,60,60,0,60,30,0,0

 Additionally I used this code to generate the NAME column:

data want;
    set have (keep=Skole);
    retain flag 0 ID Name;

    if anyalpha(skole)=0 and find(skole, '.')=0 then
        ID=skole;
    else if anydigit(skole)=0 then
        Name=skole;
    else
        do;
            Record=Skole;
            output;
        end;
    keep name record ID;
run;

 

7 REPLIES 7
Kurt_Bremser
Super User

Export the spreadsheet to a csv file, and post ({i} button or attachment) that here.

Or is your second data example a part of that csv? If yes, then the data is what it is, and you need to fix it in the source.

Marleneek
Calcite | Level 5

Like this? 

 

My second data example is how it looks after i used the SAS code below the example.

Kurt_Bremser
Super User

Ah, I see.

Modify your code for setting the three retained variables:

data want;
retain ID Name record;
set have;
if anyalpha(skole) = 0
then ID = skole;
else if find(skole,'.') = 0 or find(skole,'-') > 0
then Name = skole;
else do;
  Record = Skole;
  output;
end;
drop skole;
run;

Where did you get the year from? That's the only thing missing from my result.

 

PS the code used to read your csv:

data have;
infile
  '$HOME/sascommunity/Pt20101.csv'
  encoding='utf-8'
  termstr=CR
  dlm=';'
  firstobs=2
  truncover
  dsd
;
input
  Skole :$50.
  bil
  bio
  bhk
  dan
  eng
  fra
  fk
  geo
  his
  hsh
  idr
  kt
  kri
  kri_rel
  mat
  mus
  nt
  sam
  tys
  vf
;
run;
Marleneek
Calcite | Level 5

I created the variable year myself. I tried reading the code you wrote, but I still don't get the output that I wan't. 

When I use the code I posted first (below with a few alterations, but nothing important), I almost get the right output. It seems like SAS don't put schools with a name that contains a number in the SKOLETX column.

 

data pt_2010; 
    set prove_2010;
    array change _numeric_;
        do over change;
            if change=. then change=0;
        end;
    aar=2010.;
    retain flag 0 skolekode skoletx;
    if anyalpha(skole)=0 and find(skole, '.')=0 then
        skolekode=skole;
    else if anydigit(skole)=0 then
        skoletx=skole;
    else
        do;
            kltrin=Skole;
            output;
        end;
    label Name=skoletx;    
    drop Skole flag; 
run;

data pt_2010check;
set PT_2010;
where kltrin not like '%klasse';
run;

 I use the second data step to see if the KLTRIN column has any observations that aren't equal to "klasse" in the usual form e.g. "2. klasse" or "3. klasse" etc. I get the following:

 

 

data ARBEJDE.CHECK2010_1;
  infile datalines dsd truncover;
  input aar:32. skolekode:$64. skoletx:$64. kltrin:$64. dan:BEST. eng:BEST. tys:BEST. fra:BEST. kri:BEST. kri_rel:BEST. sam:BEST. mat:BEST. fk:BEST. nt:BEST. bio:BEST. geo:BEST. bil:BEST. mus:BEST. hsh:BEST. idr:BEST. kt:BEST. bhk:BEST. vf:BEST.;
datalines4;
2010,153013,Måløvhøj Skole - Østerhøj,Uddannelsescenter Nygård - 10.Klasseskolen,180,120,120,120,0,0,60,120,60,0,0,0,0,0,0,0,20,0,180
2010,159038,Skovbrynet skole,"GXU - Gladsaxe 10. Klasse og Ungdomsskole, 10. klasse afdelingen",171,114,114,114,0,0,57,114,57,0,0,0,0,0,0,0,33,0,297
2010,165017,Hyldagerskolen,Det 10. Element  Albertslund 10. kl. skole,332,125,81,0,0,0,70,125,60,6,0,0,0,15,0,96,20,0,0
2010,169021,Torstorp Skole,LINIE10,180,180,120,0,0,0,0,180,60,0,0,0,0,0,0,0,60,0,240
2010,183011,Vejlebroskolen,Ishøj 10.klassecenter,224,168,112,0,0,0,0,168,42,0,0,0,0,0,0,14,0,0,78
2010,217033,Byskolen,10.Klasseskolen,180,120,90,90,0,30,30,120,90,0,0,0,0,0,0,0,30,0,330
2010,219024,"Grønnevang Skole, afdeling Østervang",10. klasseskolen,160,120,0,0,0,0,0,160,0,0,0,0,0,0,0,0,40,0,420
2010,269012,Munkekærskolen,10Solrød,145,95,95,95,0,0,45,120,45,0,0,0,0,0,0,45,45,0,240
2010,333021,Antvorskov Skole,Xclass. 10. Klasse-centret i Slagelse,188.57,160.71,102.86,102.86,0,0,51.43,128.57,60,0,38.57,0,141.43,0,0,357.86,122.14,0,51.43
2010,373028,"Kobberbakkeskolen, afd. Sct. Jørgen",10'eren,391,129,99,0,0,0,0,121,50,0,0,0,0,0,0,0,0,0,66
2010,400036,"Hans Rømer Skolen, Afdeling Aaker",10. klasseskolen,180,120,120,0,0,0,0,120,60,0,0,0,0,0,0,60,30,0,390
2010,495005,Juulskovskolen,4kløverskolen,200.2173913,77.5,71.630434783,0,26.847826087,0,25.869565217,125.2173913,44.239130435,18.043478261,34.02173913,26.086956522,13.804347826,15.543478261,37.5,63.47826087,28.260869565,690,25.869565217
2010,515021,Favrdalskolen,10. ved Kløften,150,120,120,0,0,0,0,150,120,0,0,0,0,0,0,0,20,0,180
2010,541010,Tønder Grundskole,Tønder10,160,120,120,0,0,0,60,160,60,0,60,0,0,0,0,60,0,0,60
2010,545016,Høje Kolstrup Skole,10. klasse Aabenraa,180,120,120,0,0,0,0,120,120,0,0,0,0,0,0,0,0,0,240
2010,565014,Grindsted Vestre Skole,10. Klasse Skolen,150,120,120,0,0,0,120,150,90,0,60,0,120,0,120,270,59,0,120
2010,615029,"Horsens Byskole, folkeskoleafdeling","Learnmark Horsens, Step 10",180,120,30,0,0,0,90,150,30,0,0,0,0,30,30,30,30,0,120
2010,631034,Hældagerskolen,10. Klasse UngdomsCenter Vejle,157,99,95,0,0,20,35,103,103,0,0,0,0,0,0,96,20,0,130
2010,707011,Åboulevarden,10. Klasse Center Djursland,174,133,0,0,0,0,60,173,0,0,0,0,0,0,0,0,20,0,280
2010,751116,Kolt Skole,"Center-10, Aarhus High School",150.26344086,114.74731183,0,0,0,0,1.8897849462,147.95698925,55.913978495,0,0,0,0,0,0,10.145698925,0,0,352.6688172
2010,779017,"Grundskoleafdelingen, Ådalskolen",10. Klasse Center Skive,210,105,92,0,0,0,92,105,92,0,92,0,0,0,0,120,0,0,120
2010,785015,Vestervig Skole,Campus 10,150,120,90,0,0,0,0,120,60,0,0,0,0,0,0,0,30,0,420
2010,813211,Elling Skole,Frederikshavn Kommunale Ungdomsskole og 10. klassecenter,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2010,820001,Ulvkærskolen,10. KlasseCentret Vesthimmerland,198,132,111,0,0,15,0,177,69,0,0,0,0,30,0,30,75,0,276
2010,821019,Lundergård undervisningssted,Hjørring Ny 10.,170,105,100,0,0,0,40,145,60,0,0,0,0,0,0,40,298,0,40
2010,823006,Onsild Skole,Hobro Søndre Skole og MF10,218.20093458,71.705607477,56.425233645,0,30.981308411,0,20.747663551,132.19626168,36.168224299,29.299065421,25.23364486,20.817757009,21.238317757,23.971962617,28.808411215,66.939252336,30,840,31.401869159
2010,823011,Bymarkskolen,Mariagerfjord 10. klassecenter,180,120,120,0,0,0,0,120,60,0,0,0,0,0,0,0,0,0,420
;;;;

I want the third column to be replaced by the fourth and the fourth column should only contain "10. klasse" for the above observations e.g. "2010, 823011, Mariagerfjord 10. klassecenter, 10. klasse". 

 

Kurt_Bremser
Super User

I added an additional condition that seems to take care of most of your problems:

data have;
infile
  '$HOME/sascommunity/Pt20101.csv'
  encoding='utf-8'
  termstr=CR
  dlm=';'
  firstobs=2
  truncover
  dsd
;
input
  Skole :$50.
  bil
  bio
  bhk
  dan
  eng
  fra
  fk
  geo
  his
  hsh
  idr
  kt
  kri
  kri_rel
  mat
  mus
  nt
  sam
  tys
  vf
;
run;

data want;
retain ID Name record;
set have;
if anyalpha(skole) = 0
then ID = skole;
else if find(skole,'.') = 0 or find(skole,'-') > 0 or anydigit(substr(skole,1,1)) = 0
then Name = skole;
else do;
  Record = Skole;
  output;
end;
drop skole;
run;

proc freq data=want;
tables record;
run;

The result from the freq now only shows some very peculiar entries that might even be valid:

---------------------------------------------
0. klasse                                1457
1. klasse                                1464
10. Klasse Center Djursland                 1
10. Klasse Center Skive                     1
10. Klasse Skolen                           1
10. Klasse UngdomsCenter Vejle              1
10. KlasseCentret Vesthimmerland            1
10. klasse                                180
10. klasse Aabenraa                         1
10. klasseskolen                            2
10. ved Kløften                             1
10.Klasseskolen                             1
2. klasse                                1460
3. klasse                                1464
4. klasse                                1460
5. klasse                                1457
6. klasse                                1450
7. klasse                                1136
8. klasse                                1060
9. klasse                                1057

This might even give you material for going back and cleaning the original data.

Marleneek
Calcite | Level 5

It took care of most of my problems. I now only need to get the names for the schools below in the RECORD column moved to the NAME column and have "10. klasse" in the RECORD column instead. Is it possible to modify the code in order to do this? 

Skærmbillede 2018-03-02 kl. 13.21.51.png

Kurt_Bremser
Super User

Can you make up a simple rule for catching those instances? And are the "names" valid at all?

I'd throw that back to the people responsible for the source so that they can clean out their data.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2239 views
  • 0 likes
  • 2 in conversation