Takumi Niwa IEEESM'23

105 Views

March 27, 23

スライド概要

シェア

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation IEEESM’23 @ KAUST Takumi Niwa, Ismail Arai, Arata Endo, Masatoshi Kakiuchi, Kazutoshi Fujikawa (Nara Institute of Science and Technology (NAIST), Japan)

2.

Bus Arrival Time (BAT) prediction 2 • BAT Prediction is important to improve the quality of route bus services. • Users can use route buses with less waiting time. • Bus operators can manage and evaluate bus schedules. Predicted Bus Arrival Times Next 2nd 3rd 16:21 (+1 min) 16:38 (- 2 min) 17:25 (+5 min) • The existing studies on BAT prediction used deep learning models in recent years[1]. • Several prediction models [2,3] can predict BAT for multiple trips. [1] N. Singh, and K. Kumar, “A review of bus arrival time prediction using artificial intelligence,” WIREs Data Mining and Knowledge Discovery, vol. 12, no. 4, p. e1457. [2] N. C. Petersen, F. Rodrigues, and F. C. Pereira, “Multi-output bus travel time prediction with convolutional LSTM neural network,” Expert Systems with Applications, vol. 120, pp. 426–435, 2019. [3] A. Ishinaga, I. Arai, M. Kakiuchi, and K. Fujikawa, “Bus arrival time prediction method by convolution of operation and weather information,” Research Report: Intelligent Transportation Systems and Smart Communities, vol. 2021-ITS-84, no. 6, pp. 1–8, 2021. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

3.

Bus operation data • The existing BAT prediction uses bus operation data. • Including running time of links and stopping time of bus stops. • Several prediction models require consecutive data for multiple trips. • Ishinaga’s model [3] requires the input of bus operation data for the last 8 trips. • Even if the data has a missing rate of only 10%, the probability of being able to continuously extract it 8 times is as low as 43%. • Missing bus operation data require imputation. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 3

4.

4 9863 trips (390 days) White: Normal data Black: Missing data • # of missing trips = 743 / 9863 trips (7.53%) • The percentage of the data obtained for 8 consecutive trips is 67.2%. • It is difficult to eliminate missing due to trouble with onboard bus equipment, packet loss, and failure in the GPS sensor. • The existing studies [2,3] have used simple missing imputations such as LOCF. • Last observation carried forward (LOCF) replaces a missing value with the last value observed before the missing value occurred. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 2022-09-25 2021-09-01 Missing in actual bus operation dataset

5.

Related works of imputation in time series data prediction 5 • Shin et al. [4] reduced the prediction error of LSTM-based traffic congestion prediction by using the imputation method focused on traffic data characteristics. Spatial imputation Temporal imputation Pattern imputation Using the road conditions adjacent to the missing location Using the mean of the 𝑛-time previous data at a missing location Using the pattern data generated in advance for each day of the week + + [4] D.-H. Shin, K. Chung, and R. C. Park, “Prediction of traffic congestion based on LSTM through correction of missing temporal and spatial data,” IEEE Access, vol. 8, pp. 150 784–150 796, 2020. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

6.

Related works of imputation in time series data prediction 6 • Shin et al. [4] reduced the prediction error of LSTM-based traffic congestion prediction by using the imputation method focused on traffic data characteristics. Would an imputation focused on the characteristics of bus operation data help to reduce BAT prediction errors? [4] D.-H. Shin, K. Chung, and R. C. Park, “Prediction of traffic congestion based on LSTM through correction of missing temporal and spatial data,” IEEE Access, vol. 8, pp. 150 784–150 796, 2020. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

7.

Bus operation dataset used in this study Index Date Trip ID 1 2022-06-01 2 Running time (sec) Stopping time (sec) 7 Timetable difference (sec) 1 ⋯ 5 1 93.5 ⋯ 2022-06-01 2 129.0 ⋮ ⋮ ⋮ 26 2022-06-01 26 105.5 ⋯ 479.5 118.0 ⋯ 404.5 -119.7 ⋯ -284.7 27 2022-06-02 1 125.0 ⋯ 456.0 60.0 ⋯ 16.0 -58.9 ⋯ 109.6 ⋮ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⋮ 1 ⋯ 6 1 ⋯ 1355.5 28.0 ⋯ 36.0 -18.5 ⋯ 692.9 ⋯ NA 121.0 ⋯ NA -120.5 ⋯ NA ⋯ ⋮ ⋯ ⋮ ⋯ ⋮ ⋯ • Trip ID indicates the number of trips in a day. • Running time, stopping time, and timetable difference are recorded for each link/bus stop. ⋮ ⋮ ⋮ ⋮ ⋯ 6 ⋮ • If even one value is missing, the data of the trip is missing. • Imputation is performed on a columnby-column basis. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

8.

Proposal methods 8 • We propose three imputation methods based on Shin’s method [4]. • We do not use spatial imputation because it is not applicable due to the nature of bus operation data. Temporal imputation Uses means of running and stopping times of several trips before missing data. Pattern imputation Combined imputation Uses means of running and stopping times at the same hour as a bus service where missing occurs. Uses pattern imputation for consecutive missing data and temporal imputation for nonconsecutive missing data. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

9.

Proposal methods: Temporal imputation Trip ID 1 Running time 1 222.5 2 250.0 3 125.0 4 5 • Use the mean of the 𝑁mean trips before the missing value. 𝑁mean = 3 Trip ID mean: 199.2 1 Running time 1 222.5 2 250.0 3 125.0 NA 4 199.2 209.5 5 209.5 We expect this method to reflect recent trip disruptions, such as rain or traffic congestion delays. This method does not focus on daily periodicity. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 9

10.

Proposal methods: Pattern imputation Running time 2 Date Trip ID 2022-06-02 1 2022-06-02 2 114.0 ︙ ︙ ︙ 2022-06-03 1 93.5 2022-06-03 2 125.0 ︙ ︙ ︙ 2022-06-04 1 82.5 2022-06-04 2 85.0 ︙ ︙ ︙ Trip ID 102.5 Calculate the mean of running times with matching trip IDs Trip ID 1 Running time 2 1 NA 2 103.5 ︙ ︙ Pattern data Trip ID impute Running time 2 92.8 2 108.0 ︙ ︙ Running time 2 1 92.8 2 103.5 ︙ ︙ 10 • Use pattern data generated from non-missing parts in a bus operation dataset. • The pattern data is generated by calculating the mean of the bus operation data with matching trip IDs. We expect this method to incorporate daily periodicity, such as morning and evening congestion periods. This method ignores the recent trip disruptions. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

11.

Proposal methods: combined imputation 11 Use temporal imputation for non-consecutive missing Trip ID Running time 1 Trip ID Running time 1 1 222.5 1 222.5 2 250.0 2 250.0 3 125.0 3 125.0 4 NA 4 119.2 5 209.5 5 209.5 6 NA 6 212.4 ︙ ︙ ︙ ︙ 𝑁mean = 3 mean: 199.2 Pattern data Trip ID ︙ 5 6 7 ︙ Running time 1 ︙ 185.6 212.4 200.8 ︙ Use pattern imputation for consecutive missing • Use a combination of temporal imputation and pattern imputation This method compensates for the weaknesses of temporal imputation and pattern imputation each other. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

12.

Analysis of bus operation dataset 1 day • There is daily periodicity. 1 0.8 Autocorrelation 12 • This route has 26 trips per day. 0.6 • Other running and stopping times have the same periodicity. 0.4 0.2 0 −0.2 0 26 52 78 104 130 156 182 Lags Autocorrelation of running time for a given link Is an imputation focused on daily periodicity effective? ※ Lags is the number of trips shifted from the original data. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

13.

Evaluation How the prediction error changes when different imputation methods are applied to bus operation data → Evaluate whether each imputation method is suitable for BAT prediction • We input test data with varying missing rates into trained prediction models. • We assumed the case of missing input data at the time of prediction. • We predicted the BATs of one, two, and three trips ahead, then compared the prediction errors for each imputation method. • We use Ishinaga’s model [3] for BAT prediction. • We used Mean Absolute Error (MAE) as the error measure for predicted BATs. ( • MAE = ) ∑)*+( 𝑡̂* − 𝑡* • where 𝑡̂* is predicted BAT, 𝑡* is actual BAT, and 𝑛 is a number of instances. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 13

14.

Methods used in experiments 14 Targets for comparison with the proposed method u Baseline: Historical Average (HA) • HA uses the mean of previous trips as the predicted BAT (Not Ishinaga’s method). u Existed method: LOCF • Last observation carried forward (LOCF) replaces a missing value with the last value observed before the missing value occurred. • The existing studies [2,3] have used it, u Temporal imputation • 𝑁mean = 5 due to analysis of training data. u Pattern imputation • Pattern data was made with training data. u Combined imputation • 𝑁mean = 5 due to analysis of training data. • Pattern data was made with training data. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

15.

Target bus route 15 • Estimated trip time: 35 min. 6: Shin-Kobe sta. • Number of trips per day: 26 • 6:00–22:00 4: Kobe Bay Sheraton hotel 3: Rokko island Konan hospital 2: West Court 7 bangai 5: Kobe-Sannomiya sta. 1: Kobe international univ. • Data collection period: from 2021-09-01 to 2022-09-25 • Total number of trips: 9863 Line No.21 (inbound) of Kobe Minato Kanko Bus © OpenStreetMap, https://openstreetmap.org/copyright Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

16.

Results – MAE of one trip ahead prediction HA MAE of one trip ahead (sec) 170 LOCF Temporal imputation Pattern imputation Combined imputation • LOCF had the lowest MAE when the missing rate of test data was less than 30%. 165 160 155 • As the missing rate increased, the MAE of LOCF increased significantly. 150 145 140 16 0 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 • Temporal imputation had a higher MAE. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

17.

Results – MAE of two trips ahead prediction HA MAE of two trips ahead (sec) 170 LOCF Temporal imputation Pattern imputation 17 Combined imputation • The MAE increased compared to one trip ahead, mainly when applying LOCF. 165 160 155 150 • Pattern imputation had a lower MAE. 145 140 0 10 20 30 40 50 60 70 80 90 Missing rate of test data (%) Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

18.

Results – MAE of three trips ahead prediction HA MAE of three trips ahead (sec) 170 LOCF Temporal imputation Pattern imputation Combined imputation 165 • MAE increased further compared to two trips ahead. 160 155 150 • Pattern imputation had a lower MAE. 145 140 18 0 10 20 30 40 50 60 70 80 90 Missing rate of test data (%) Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

19.

Results – Focusing on missing rates of 30% or less 170 155 150 145 140 Pattern imputation 1 trip ahead 0 155 150 145 140 0 Combined imputation 160 160 MAE of two trips ahead (sec) 160 Temporal imputation 160 MAE of one trip ahead (sec) 165 LOCF 10 20 30 10 20 30 40 Missing rate of test data (%) 50 2 trips ahead 155 150 145 140 0 10 20 30 60 70 80 90 Missing rate of test data (%) Missing rate of test data (%) LOCF had a lower MAE when predicting one trip ahead. MAE of three trips ahead (sec) HA 19 3 trips ahead 155 150 145 140 0 10 20 30 Missing rate of test data (%) Imputation focused on daily periodicity is effective when predicting multiple trips. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

20.

Conclusion 20 • BAT prediction requires the imputation of bus operation data. • Even a missing rate of a few percent in a bus operation dataset makes it difficult to input the prediction model. • We proposed three imputation methods focused on the characteristics of bus operation data. • Temporal imputation / Pattern imputation / Combined imputation • Pattern imputation focused on daily periodicity is particularly effective in BAT prediction for multiple trips. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

21.

Future works 21 • Evaluating whether the imputation focused on daily periodicity can also reduce the error of BAT prediction for other bus routes. • Improving our imputation method, especially temporal imputation. • Temporal imputation results cannot reflect the daily periodicity, and the prediction model cannot learn the daily periodicity. • We need another approach that incorporates the disruptions of the previous trip. • E.g., The degree of delay of the last trip can be calculated based on the pattern data, and a bias can be applied. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

22.

Thank you for listening! • Takumi Niwa (Nara Institute of Science and Technology (NAIST), Japan) • E-mail: niwa.takumi.nr4@is.naist.jp Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

23.

Appendix Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

24.

Decoder Predicted BAT Temperature Precipitation Sunny day flag Weather Data Predicted running time calculation binding Bus Operation Data Encoder binding Cloudy day flag Rainy day flag Decoder Timetable diff BiConvLSTM Encoder Stopping time Prediction model for running time Value scaling Running time Missing data imputation Existing BAT prediction method (Ishinaga et al.)[2] BiConvLSTM Predicted stopping time Prediction model for stopping time 24 • Bus operation & weather data for the last 8 trips are input to the prediction model. • Arrival times for the next 3 trips are predicted. [2] A. Ishinaga, I. Arai, M. Kakiuchi, and K. Fujikawa, “Bus arrival time prediction method by convolution of operation and weather information,” Research Report: Intelligent Transportation Systems and Smart Communities, vol. 2021-ITS-84, no. 6, pp. 1–8, 2021. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

25.

Evaluation experiment 1. Divided the bus operation dataset into training, validation, and test data 2. Prepared several sets of test data with artificially increased missing values 3. Created prediction models for each imputation method 4. Predicted BATs by applying the same imputation method as the prediction model to each test data 5. Compared the MAE of predicted BATs Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 25

26.

Evaluation experiment (1/5) 26 1. Divided the bus operation dataset into training, validation, and test data 2. 3. 4. 5. Prepared several sets of test data with artificially increased missing values Created prediction models for each imputation method Predicted BATs by applying the same imputation method as the prediction model to each test data Compared the MAE of predicted BATs Kind Start date End date # of trips # of error trips Error rate (%) Training data Sep. 1, 2021 Sep. 3, 2022 9291 720 7.75 Validation data Sep. 4, 2022 Sep. 10, 2022 182 11 6.04 Test data Sep. 11, 2022 Sep. 25, 2022 390 12 3.08 • We divided the dataset based on a period to prevent data leakage. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

27.

Evaluation experiment (2/5) 1. 27 Divided the bus operation dataset into training, validation, and test data 2. Prepared several sets of test data with artificially increased missing values 3. 4. 5. Created prediction models for each imputation method Predicted BATs by applying the same imputation method as the prediction model to each test data Compared the MAE of predicted BATs st te Original test data Error rate 3.08% st te x91 Error rate 10% x10 ⋯ Error rate 20% Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation st te x10 st te Test data with artificial missing (90 types) st te Test data x10 Error rate 90%

28.

Evaluation experiment (3/5) 1. 2. 28 Divided the bus operation dataset into training, validation, and test data Prepared several sets of test data with artificially increased missing values 3. Created prediction models for each imputation method 4. 5. Predicted BATs by applying the same imputation method as the prediction model to each test data Compared the MAE of predicted BATs Apply imputation method Model LOCF applied ⋮ Create Ishinaga’s prediction models [3] Training & validation data ⋮ Model One prediction model for each imputation method u LOCF u Temporal imputation u Pattern imputation u Combined imputation = Four prediction models Combined imputation applied Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

29.

Evaluation experiment (4/5) 1. 2. 3. 29 Divided the bus operation dataset into training, validation, and test data Prepared several sets of test data with artificially increased missing values Created prediction models for each imputation method 4. Predicted BATs by applying the same imputation method as the prediction model to each test data 5. Compared the MAE of predicted BATs st te Apply imputation method x91 LOCF applied st te x91 input x91 x91 predict ⋮ ⋮ st te Test data BATs Model Model Combined imputation applied Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation ⋮ x91

30.

Evaluation experiment (5/5) 1. 2. 3. 4. 30 Divided the bus operation dataset into training, validation, and test data Prepared several sets of test data with artificially increased missing values Created prediction models for each imputation method Predicted BATs by applying the same imputation method as the prediction model to each test data 5. Compared the MAE of predicted BATs BATs Calculate MAE MAE of one trip ahead (sec) x91 ⋮ LOCF Temporal imputation Pattern imputation Combined imputation 165 160 155 150 145 140 x91 HA 170 0 10 20 30 40 50 60 70 80 90 Missing rate of test data (%) Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

31.

How to view experiment results HA MAE of one trip ahead (sec) 170 LOCF Temporal imputation Pattern imputation 31 Combined imputation 165 160 155 150 145 140 0 10 20 30 40 50 60 70 80 90 Missing rate of test data (%) Results for original test data (missing rate: 3.08%) MAE varies because there were ten patterns predicted BATs for each missing rate. • Plot: Mean of MAE • Bars: Std of MAE Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

32.

The need for accurate BAT prediction 32 Rate [%] • According to a survey by Gooze et al [5], users are dissatisfied when the BAT prediction error is higher than 4– 5 minutes or more. Allowable BAT prediction error [min] • In the study by Ishinaga et al [2], the MAE is around 2 min, but there is the error of more than 10 minutes on rainy days. [5] A. Gooze, K. E. Watkins, and A. Borning, “Benefits of real-time transit in- formation and impacts of data accuracy on rider experience,” Transportation Research Record, vol. 2351, no. 1, pp. 95–103, 2013. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

33.

Related works of imputation in time series data prediction 33 • Shin et al. [4] reduced the prediction error of LSTM-based traffic congestion prediction by using the imputation method focused on traffic data characteristics. • Learning and evaluation with varying missing rates • Compared with conventional simple interpolation methods • Historical imputation method (HIM) • Nearest Neighbor imputation method (NIM) • Low absolute mean percent error (MAPE) [4] D.-H. Shin, K. Chung, and R. C. Park, “Prediction of traffic congestion based on LSTM through correction of missing temporal and spatial data,” IEEE Access, vol. 8, pp. 150 784–150 796, 2020. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

34.

Bus route 34 バス停 1 バス停 2 バス停 3 走行区間 1 Bus stop1 1 バス停 2 Bus stop2走行区間 2 バス停 Bus stop3 3 バス停 Example of a route with 𝑩 bus stops "" #" Link 1 走行区間 1 $" #" ## ## $# "# #$ !$ ・・・ $$ ・・・ ・・・ ・・・ ・・・ Bus stop B B バス停 "!%" #! Link B-1 走行区間 B-1 $! "!%" $!! ! !! Running time 𝑟! • At link 𝑏 Stopping time 𝑠! • At bus stop 𝑏 #! Timetable Difference 𝑑! • Seconds behind schedule • Negative number for early arrival 7:15 6:45 ・・・ 走行区間 B-1 7:15 !$ ・・・ ・・・ $$ 6:45 6:42 6:40 時刻表 !# !# 6:42 6:40 時刻表 Timetable #$ Link 2 走行区間 2 $# "" $" "# バス停 B Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

35.

Ishinaga’s prediction model [2] 35 入力: !!" 運行分 Input: trips Conv LSTM … BN Dropout … Conv LSTM Conv LSTM デコーダ Conv LSTM Time Distributed (Dense) Decoder … Encoder エンコーダ BN Conv LSTM Conv LSTM Conv LSTM Dropout … Conv LSTM … Conv LSTM BN Dropout Conv LSTM BN 出力: !#$% 運行分 Output: trips … Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

36.

Daily periodicity of running time 36 Running time (sec) • Morning and evening buses have shorter running times. Estimated Trip ID Mean + Std of running time per trip for link 4 • Fewer passengers • Less traffic • Trips 6-10 have longer running times. • Many passengers Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

37.

Links and estimated travel time of target route Link No. 37 Estimated travel time (min.) Departure Arrival 1 Kobe international univ. West Court 7 bangai 2 2 West Court 7 bangai Rokko island Konan hosp. 1 3 Rokko island Konan hosp. Kobe Bay Sheraton hotel 2 4 Kobe Bay Sheraton hotel Kobe-Sannomiya sta. 20 5 Kobe-Sannomiya sta. Shin-Kobe sta. 10 Total: 35 • Daily periodicity is small for links 1-3 due to the short running time. Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

38.

Autocorrelation Autocorrelation – Running time of link 1 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 38

39.

Autocorrelation Autocorrelation – Running time of link 2 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 39

40.

Autocorrelation Autocorrelation – Running time of link 3 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 40

41.

Autocorrelation Autocorrelation – Running time of link 4 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 41

42.

Autocorrelation Autocorrelation – Running time of link 5 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 42

43.

Autocorrelation Autocorrelation – Stopping time of bus stop 1 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 43

44.

Autocorrelation Autocorrelation – Stopping time of bus stop 2 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 44

45.

Autocorrelation Autocorrelation – Stopping time of bus stop 3 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 45

46.

Autocorrelation Autocorrelation – Stopping time of bus stop 4 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 46

47.

Autocorrelation Autocorrelation – Stopping time of bus stop 5 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 47

48.

Autocorrelation Autocorrelation – Stopping time of bus stop 6 Lags Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 48

49.

Determination of 𝑵mean for temporal imputation and combined imputation 49 • Prediction of bus arrival time at the last bus stop is influenced by links 4 and 5. • Autocorrelations for running times 4 and 5 showed correlations for 5 trips before and after the peak. • We used 𝑁mean = 5. Autocorrelation of running time of link 5 Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

50.

MAE of imputation (sec) MAE of imputation (Running time, Error rate = 30%) Temporal imputation Pattern imputation Combined imputation Link Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 50

51.

MAE of imputation (sec) MAE of imputation (Stopping time, Error rate = 30%) Temporal imputation Pattern imputation Combined imputation Bus stop Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 51

52.

Analysis of impute results of temporal imputation 52 Sep. 17th, 2021,𝑁mean = 5 Artificial missing Running Time 4 (sec) Temporal imputation • Trips 12 and 13 were imputed higher than actual. • Trips 17 and 18 were imputed lower than actual. Trip • Temporal imputations may show the opposite increase or decrease from the actual increase or decrease. The red dotted line is the estimated running time (1200 sec.) Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

53.

Analysis of impute results of pattern imputation 53 Sep. 17th, 2021 Pattern imputation Artificial missing Pattern data Running Time 4 (sec) • Less imputation error than temporal imputation. • Pattern imputation has a larger error when trips are disrupted compared to the pattern (trip 17, 18) Trip The red dotted line is the estimated running time (1200 sec.) Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

54.

Analysis of impute results of combined imputation 54 Sep. 15th, 2021 LOCF Temporal imputation Pattern imputation Artificial missing Combined imputation • Except for Trip 9, the impute results are the same as for pattern imputation. Running Time 4 (sec) HA 10 20 30 40 50 60 70 80 90 • Trip 9 has more errors than pattern imputation. Missing rate of test data (%) Trip Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation

55.

Prediction result – one trip ahead, stable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 便 Trip Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 55

56.

Prediction result – two trip ahead, stable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 便 Trip Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 56

57.

Prediction result – three trip ahead, stable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 便 Trip Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 57

58.

Prediction result – one trip ahead, unstable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 Trip 便 Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 58

59.

Prediction result – one trip ahead, unstable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 Trip 便 Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 59

60.

Prediction result – one trip ahead, unstable operation linear interpolation LOCF Temporal imputation Pattern imputation Combined imputation Total trip time (sec) HA 10 20 30 40 50 60 Missing rate of test data (%) 70 80 90 Trip 便 Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 60

61.

Which imputation use in combined imputation? Temporal • The higher the missing rate of test data, the more the impute results are identical to pattern imputation. Original utilization (%) Pattern Error rate of test data Mar. 20, 2023 ― Improving Bus Arrival Time Prediction Accuracy with Daily Periodic Based Transportation Data Imputation 61