This page was generated from docs/user_guide/3-session.ipynb. Interactive online version: Binder badge.

Investigating the historical running data to evaluate my performance (Session summaries)#

Many runners use third-party apps to track running activities. These apps and its companion website provide many visual charts with analytical metrics to help runners review their running performances to set up new training plans or make adjustments. In this notebook, we will extract this running data and analyse it locally using runpandas. We also take the opportunity to illustrate the runpandas methods for summarizing the historical workouts and get some valuable insights.

Looking at the data#

The example data set used in this tutorial contains 68 sessions of a single female runner from the period of 2020 until 2021.

The code chunk below loads the data using the method runpandas.read_directory_aggregate, which allows the user to read all the tracking files of a support format in a directory and combine them in a data frame split by sessions based on the timestamps of each activity. It means that for each workout file will be stored in separate lines in the dataframe.

[2]:
import warnings
warnings.filterwarnings('ignore')
[3]:
import runpandas
session = runpandas.read_dir_aggregate(dirname='session/')

In pandas we use the pandas.MultiIndex which alows the dataframe have multiple columns as a row identifier, while having each index column related to another through a parent/child relationship. In our scenario we have the start time from each activity as the first index level and the timestamps from the activity as the second index level.

[4]:
session
[4]:
alt hr lon lat
start time
2020-08-30 09:08:51.012 00:00:00 NaN NaN -34.893609 -8.045055
00:00:01.091000 NaN NaN -34.893624 -8.045054
00:00:02.091000 NaN NaN -34.893641 -8.045061
00:00:03.098000 NaN NaN -34.893655 -8.045063
00:00:04.098000 NaN NaN -34.893655 -8.045065
... ... ... ... ... ...
2021-07-04 11:23:19.418 00:52:39.582000 0.050001 189.0 -34.894534 -8.046602
00:52:43.582000 NaN NaN -34.894465 -8.046533
00:52:44.582000 NaN NaN -34.894443 -8.046515
00:52:45.582000 NaN NaN -34.894429 -8.046494
00:52:49.582000 NaN 190.0 -34.894395 -8.046398

48794 rows × 4 columns

[5]:
session.index  #MultiIndex (start, timestamp)
[5]:
MultiIndex([('2020-08-30 09:08:51.012000',        '00:00:00'),
            ('2020-08-30 09:08:51.012000', '00:00:01.091000'),
            ('2020-08-30 09:08:51.012000', '00:00:02.091000'),
            ('2020-08-30 09:08:51.012000', '00:00:03.098000'),
            ('2020-08-30 09:08:51.012000', '00:00:04.098000'),
            ('2020-08-30 09:08:51.012000', '00:00:05.096000'),
            ('2020-08-30 09:08:51.012000', '00:00:06.096000'),
            ('2020-08-30 09:08:51.012000', '00:00:07.097000'),
            ('2020-08-30 09:08:51.012000', '00:00:08.097000'),
            ('2020-08-30 09:08:51.012000', '00:00:09.102000'),
            ...
            ('2021-07-04 11:23:19.418000', '00:52:18.584000'),
            ('2021-07-04 11:23:19.418000', '00:52:22.584000'),
            ('2021-07-04 11:23:19.418000', '00:52:26.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:30.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:35.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:39.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:43.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:44.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:45.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:49.582000')],
           names=['start', 'time'], length=48794)

Now let’s see how many activities there are available for analysis. For this question, we also have an acessor runpandas.types.acessors.session._SessionAcessor that holds several methods for computing the basic running metrics across all the activities and some summary statistics.

[6]:
#count the number of activities in the session
print ('Total Activities:', session.session.count())
Total Activities: 68

We can compute the main running metrics (speed, pace, moving, etc) using the session methods available as like the ones available in the runpandas.types.metrics.MetricsAcessor . By the way, those methods are called inside each metric method, but applying in each of activities separatedely.

[7]:
#In this example we compute the distance and the distance per position across all workouts
session = session.session.distance()
session
[7]:
alt hr lon lat distpos dist
start time
2020-08-30 09:08:51.012 00:00:00 NaN NaN -34.893609 -8.045055 NaN NaN
00:00:01.091000 NaN NaN -34.893624 -8.045054 1.690587 1.690587
00:00:02.091000 NaN NaN -34.893641 -8.045061 2.095596 3.786183
00:00:03.098000 NaN NaN -34.893655 -8.045063 1.594298 5.380481
00:00:04.098000 NaN NaN -34.893655 -8.045065 0.163334 5.543815
... ... ... ... ... ... ... ...
2021-07-04 11:23:19.418 00:52:39.582000 0.050001 189.0 -34.894534 -8.046602 12.015437 8220.018885
00:52:43.582000 NaN NaN -34.894465 -8.046533 10.749779 8230.768664
00:52:44.582000 NaN NaN -34.894443 -8.046515 3.163638 8233.932302
00:52:45.582000 NaN NaN -34.894429 -8.046494 2.851535 8236.783837
00:52:49.582000 NaN 190.0 -34.894395 -8.046398 11.300740 8248.084577

48794 rows × 6 columns

[8]:
#comput the speed for each activity
session = session.session.speed(from_distances=True)
#compute the pace for each activity
session = session.session.pace()
#compute the inactivity periods for each activity
session = session.session.only_moving()
[9]:
session
[9]:
alt hr lon lat distpos dist speed pace moving
start time
2020-08-30 09:08:51.012 00:00:00 NaN NaN -34.893609 -8.045055 NaN NaN NaN NaT False
00:00:01.091000 NaN NaN -34.893624 -8.045054 1.690587 1.690587 1.549576 00:00:00.645337 True
00:00:02.091000 NaN NaN -34.893641 -8.045061 2.095596 3.786183 2.095596 00:00:00.477191 True
00:00:03.098000 NaN NaN -34.893655 -8.045063 1.594298 5.380481 1.583216 00:00:00.631625 True
00:00:04.098000 NaN NaN -34.893655 -8.045065 0.163334 5.543815 0.163334 00:00:06.122433 False
... ... ... ... ... ... ... ... ... ... ...
2021-07-04 11:23:19.418 00:52:39.582000 0.050001 189.0 -34.894534 -8.046602 12.015437 8220.018885 3.003859 00:00:00.332905 True
00:52:43.582000 NaN NaN -34.894465 -8.046533 10.749779 8230.768664 2.687445 00:00:00.372100 True
00:52:44.582000 NaN NaN -34.894443 -8.046515 3.163638 8233.932302 3.163638 00:00:00.316091 True
00:52:45.582000 NaN NaN -34.894429 -8.046494 2.851535 8236.783837 2.851535 00:00:00.350688 True
00:52:49.582000 NaN 190.0 -34.894395 -8.046398 11.300740 8248.084577 2.825185 00:00:00.353959 True

48794 rows × 9 columns

After all the computation done, let’s going to the next step: the exploration and get some descriptive statistics.

Exploring the data#

After the loading and metrics computation for all the activities, now let’s look further the data and get the basic summaries about the sessions: time spent, total distance, mean speed and other insightful statistics in each running activity. For this task, we may accomplish it by calling the method runpandas.types.session._SessionAcessor.summarize . It will return a basic Dataframe including all the aggregated statistics per activity from the season frame.

[10]:
summary = session.session.summarize()
summary
[10]:
moving_time mean_speed max_speed mean_pace max_pace mean_moving_speed mean_moving_pace mean_cadence max_cadence mean_moving_cadence mean_heart_rate max_heart_rate mean_moving_heart_rate mean_temperature min_temperature max_temperature total_distance ellapsed_time
start
2020-07-03 09:50:53.162 00:25:29.838000 2.642051 4.879655 00:06:18 00:03:24 2.665008 00:06:15 NaN NaN NaN 178.819923 188.0 178.872587 NaN NaN NaN 4089.467333 00:25:47.838000
2020-07-05 09:33:20.999 00:05:04.999000 2.227637 6.998021 00:07:28 00:02:22 3.072098 00:05:25 NaN NaN NaN 168.345455 176.0 168.900000 NaN NaN NaN 980.162640 00:07:20.001000
2020-07-05 09:41:59.999 00:18:19 1.918949 6.563570 00:08:41 00:02:32 2.729788 00:06:06 NaN NaN NaN 173.894180 185.0 174.577143 NaN NaN NaN 3139.401118 00:27:16
2020-07-13 09:13:58.718 00:40:21.281000 2.509703 8.520387 00:06:38 00:01:57 2.573151 00:06:28 NaN NaN NaN 170.808176 185.0 170.795527 NaN NaN NaN 6282.491059 00:41:43.281000
2020-07-17 09:33:02.308 00:32:07.691000 2.643278 8.365431 00:06:18 00:01:59 2.643278 00:06:18 NaN NaN NaN 176.436242 186.0 176.436242 NaN NaN NaN 5095.423045 00:32:07.691000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-06-13 09:22:30.985 01:32:33.018000 2.612872 23.583956 00:06:22 00:00:42 2.810855 00:05:55 NaN NaN NaN 169.340812 183.0 169.655879 NaN NaN NaN 15706.017295 01:40:11.016000
2021-06-20 09:16:55.163 00:59:44.512000 2.492640 6.065895 00:06:41 00:02:44 2.749453 00:06:03 NaN NaN NaN 170.539809 190.0 171.231392 NaN NaN NaN 9965.168311 01:06:37.837000
2021-06-23 09:37:44.000 00:26:49.001000 2.501796 5.641343 00:06:39 00:02:57 2.568947 00:06:29 NaN NaN NaN 156.864865 171.0 156.957031 NaN NaN NaN 4165.492241 00:27:45.001000
2021-06-27 09:50:08.664 00:31:42.336000 2.646493 32.734124 00:06:17 00:00:30 2.661853 00:06:15 NaN NaN NaN 166.642857 176.0 166.721116 NaN NaN NaN 5074.217061 00:31:57.336000
2021-07-04 11:23:19.418 00:47:47.583000 2.602263 4.212320 00:06:24 00:03:57 2.856801 00:05:50 NaN NaN NaN 177.821862 192.0 177.956967 NaN NaN NaN 8248.084577 00:52:49.582000

68 rows × 18 columns

Here, some descriptive statistics:

[11]:
summary['day_diff'] = summary.index.to_series().diff().astype('timedelta64[D]').astype('Int64')
summary['pace_moving_all_mean'] = summary.mean_moving_pace.mean()
summary['distance_all_mean'] = round(summary.total_distance.mean()/1000,2)
summary['mean_speed'] = summary['mean_speed'] * 3.6 #convert from m/s  to km/h
summary['max_speed'] = summary['max_speed'] * 3.6 #convert from m/s  to km/h
summary['mean_moving_speed'] = summary['mean_moving_speed'] * 3.6 #convert from m/s  to km/h
[12]:
print('Session Interval:', (summary.index.to_series().max() - summary.index.to_series().min()).days, 'days')
print('Total Workouts:', len(summary), 'runnings')
print('Tota KM Distance:', summary['total_distance'].sum() / 1000)
print('Running Intervals (average):' , round(summary.day_diff.mean(), 2), 'days')
print('Average Pace (all runs):', summary.mean_pace.mean())
print('Average Moving Pace (all runs):', summary.mean_moving_pace.mean())
print('Average KM Distance (all runs):', round(summary.total_distance.mean()/ 1000,2))
Session Interval: 366 days
Total Workouts: 68 runnings
Tota KM Distance: 491.77377537338896
Running Intervals (average): 5.0 days
Average Pace (all runs): 0 days 00:07:18.411764
Average Moving Pace (all runs): 0 days 00:06:02.147058
Average KM Distance (all runs): 7.23

As we can see above, we analyzed the period of 366 days (one year) of running workouts. In this period, she ran 68 times which achieved the total distance of 491 km! The average interval in days between two actitivies is 5 days, with average moving pace of 06’02” per km and average distance of 7.23km! Great numbers for a starter runner!

Exploring the data with some visualizations#

At this point, I have the data to start some visualization and analysis. The first question that popped was : How was the evolution of my average moving pace through all the runs and the corresponding average? Let’s use the matplotlib package to illustrate the possible answer.

[13]:
#let's convert the pace to float number in minutes
import datetime
summary['mean_moving_pace_float'] = summary['mean_moving_pace'] / datetime.timedelta(minutes=1)
summary['pace_moving_all_mean_float'] = summary['pace_moving_all_mean'] / datetime.timedelta(minutes=1)
#replace NA values with 0
summary['day_diff'].fillna(0, inplace=True)
[14]:
import matplotlib.pyplot as plt

plt.subplots(figsize=(8, 5))

plt.plot(summary.index, summary.mean_moving_pace_float, color='silver')
plt.plot(summary.pace_moving_all_mean_float, color='purple', linestyle='dashed', label='average')
plt.title("Pace Evolution")
plt.xlabel("Runnings")
plt.ylabel("Pace")
plt.legend()
[14]:
<matplotlib.legend.Legend at 0x7fa8d8d28b50>
../_images/user_guide_3-session_26_1.svg

We can see some outliers at the running workouts at september 2020. Let’s do some filtering (excluding those with “unreasonable” paces that might be mislabelled)

[15]:
summary_without_outliers = summary[summary['mean_moving_pace_float'].between(5,10)]

plt.plot(summary_without_outliers.index, summary_without_outliers.mean_moving_pace_float, color='silver')
plt.plot(summary_without_outliers.pace_moving_all_mean_float, color='purple', linestyle='dashed', label='average')
plt.title("Pace Evolution")
plt.xlabel("Runnings")
plt.xticks(rotation=90)
plt.ylabel("Pace")
plt.legend()
[15]:
<matplotlib.legend.Legend at 0x7fa8d916ac10>
../_images/user_guide_3-session_28_1.svg

That looks much better now. Now we can see clearly that her pace kept floating between 6’50 min and 6’00 min over all this year. I also created a new variable at my summary dataframe called day_diff, which means the interval in day between consecutive workouts. This variable helped me to see if her pace grew up as the number of the days without running also increased.

[16]:
#recompute the day_diff removing the outliers
summary_without_outliers['day_diff'] = summary_without_outliers.index.to_series().diff().astype('timedelta64[D]').astype('Int64')
summary_without_outliers['day_diff'].fillna(0, inplace=True)

fig,ax = plt.subplots(figsize=(8, 5))
ax.plot(summary_without_outliers.index, summary_without_outliers.mean_moving_pace_float, color='silver')
ax.set_xlabel('Runnings')
ax.set_ylabel('Pace',color='silver')
ax2=ax.twinx()
ax2.plot(summary_without_outliers.index, summary_without_outliers.day_diff.astype('int'),color='purple')
ax2.set_ylabel('Days Without Running',color='indigo', rotation=270)
plt.title('Pace vs Days without Running')
plt.show()

../_images/user_guide_3-session_30_0.svg

As we can see at the chart above, there isn’t a linear correlation between the number of the days without running and her growing pace. In her case it might be the period of rest helped her to a better recovery. Can we have a statistical evidence ?

The following code will create a regression plot of her interval of days vs pace. I will use Seaborn to create plot. As we can see there is negative correlation between the two variables, although, it’s not strong.

[17]:
import seaborn as sns
sns.set(style="ticks", context="talk")
sns.regplot(x=summary_without_outliers.day_diff.astype('int'), y=summary_without_outliers.mean_moving_pace_float).set_title("Day Diff vs Pace")

[17]:
Text(0.5, 1.0, 'Day Diff vs Pace')
../_images/user_guide_3-session_33_1.svg

Let’s explore now the relationship between the mean pace and the average speed. It is obvious that the bigger the average speed the pace is lower. More fast means less time to cover the distance.

[18]:
plt.subplots(figsize=(8, 5))

plt.plot(summary_without_outliers.index, summary_without_outliers.mean_moving_pace_float, color='silver', label='Average Pace')
plt.plot(summary_without_outliers.mean_moving_speed, color='purple', label = 'Average Speed')
plt.title("Average Speed x Average Pace")
plt.xlabel("Runnings")
plt.legend()
plt.show()
../_images/user_guide_3-session_35_0.svg

The chart below illustrates the evolution of the total distance covered at her runnings workouts. There is a higher variance of her run distances. The average distance for most of the runs is close to 7km. The second chart illustrating the histogram of the run distances show that most of her usual weekday runs are around 4-7km and my longer weekend runs (she started to run longer distances above 7kms very recently).

[19]:

plt.subplots(figsize=(8, 5)) plt.plot(summary_without_outliers.index, summary_without_outliers.total_distance / 1000, color='silver') plt.plot(summary_without_outliers.distance_all_mean, color='purple', linestyle='dashed', label='average') plt.title("Distance Evolution") plt.xlabel("Runs") plt.ylabel("distance") plt.legend() plt.show()
../_images/user_guide_3-session_37_0.svg
[20]:
(summary_without_outliers['total_distance'] / 1000.0).hist(bins=30)

[20]:
<AxesSubplot:>
../_images/user_guide_3-session_38_1.svg

Now let’s see how faster she was and if there is a distinction between longer and shorter runs related to the speed (slower vs faster). Let’s try plotting distance against speed to get a better idea.

[21]:
plt.subplots(figsize=(8, 5))

plt.scatter(summary_without_outliers.total_distance/ 1000, summary_without_outliers.mean_moving_pace_float, color='purple', marker='s')
plt.title("Distance vs. pace")
plt.xlabel("Distance")
plt.ylabel("Pace")
plt.legend()
No handles with labels found to put in legend.
[21]:
<matplotlib.legend.Legend at 0x7fa8da85bf40>
../_images/user_guide_3-session_40_2.svg

The distribution of her pace across the distance runs is very dispersed. It means a no clear trend there. But what if we restrict it just to this year ?

[22]:
summary_without_outliers_2021 = summary_without_outliers[summary_without_outliers.index > '2021-01-01']
[23]:
summary_without_outliers_2021
[23]:
moving_time mean_speed max_speed mean_pace max_pace mean_moving_speed mean_moving_pace mean_cadence max_cadence mean_moving_cadence ... mean_temperature min_temperature max_temperature total_distance ellapsed_time day_diff pace_moving_all_mean distance_all_mean mean_moving_pace_float pace_moving_all_mean_float
start
2021-01-02 09:07:36.900 01:04:45 9.244853 68.900544 00:06:29 00:00:52 9.351208 00:06:24 NaN NaN NaN ... NaN NaN NaN 10113.098986 01:05:38.100000 5 00:06:02.147058 7.23 6.400000 6.035784
2021-01-15 09:08:54.810 00:37:22.191000 10.048012 152.739973 00:05:58 00:00:23 10.106837 00:05:56 NaN NaN NaN ... NaN NaN NaN 6305.660653 00:37:39.191000 13 00:06:02.147058 7.23 5.933333 6.035784
2021-01-17 09:41:25.270 00:38:41.730000 9.321227 23.778071 00:06:26 00:02:31 9.464527 00:06:20 NaN NaN NaN ... NaN NaN NaN 6135.777736 00:39:29.731000 2 00:06:02.147058 7.23 6.333333 6.035784
2021-01-27 18:24:14.001 00:23:40 10.432233 20.445663 00:05:45 00:02:56 10.432233 00:05:45 NaN NaN NaN ... NaN NaN NaN 4114.936468 00:23:40 10 00:06:02.147058 7.23 5.750000 6.035784
2021-01-31 09:18:46.676 00:53:54.324000 8.830295 49.801374 00:06:47 00:01:12 9.416889 00:06:22 NaN NaN NaN ... NaN NaN NaN 8502.408803 00:57:46.325000 3 00:06:02.147058 7.23 6.366667 6.035784
2021-02-05 08:58:10.295 00:24:56.707000 9.690825 25.378717 00:06:11 00:02:21 9.690825 00:06:11 NaN NaN NaN ... NaN NaN NaN 4028.979227 00:24:56.707000 4 00:06:02.147058 7.23 6.183333 6.035784
2021-02-07 09:15:40.107 01:19:07.893000 8.925130 17.529999 00:06:43 00:03:25 9.314387 00:06:26 NaN NaN NaN ... NaN NaN NaN 12351.125311 01:23:01.894000 2 00:06:02.147058 7.23 6.433333 6.035784
2021-02-12 09:05:13.211 00:30:39.790000 10.035336 22.995251 00:05:58 00:02:36 10.035336 00:05:58 NaN NaN NaN ... NaN NaN NaN 5128.586271 00:30:39.790000 4 00:06:02.147058 7.23 5.966667 6.035784
2021-02-19 09:12:26.076 00:45:10.925000 9.489594 54.967249 00:06:19 00:01:05 9.564430 00:06:16 NaN NaN NaN ... NaN NaN NaN 7214.529809 00:45:36.925000 7 00:06:02.147058 7.23 6.266667 6.035784
2021-02-21 09:47:38.109 00:43:32.892000 9.840626 23.993080 00:06:05 00:02:30 9.840626 00:06:05 NaN NaN NaN ... NaN NaN NaN 7142.359084 00:43:32.892000 2 00:06:02.147058 7.23 6.083333 6.035784
2021-03-14 10:06:16.609 00:32:01.391000 9.619669 256.195692 00:06:14 00:00:14 9.668954 00:06:12 NaN NaN NaN ... NaN NaN NaN 5171.616917 00:32:15.391000 21 00:06:02.147058 7.23 6.200000 6.035784
2021-03-20 09:28:54.161 00:32:54.843000 8.040413 12.058690 00:07:27 00:04:58 9.370022 00:06:24 NaN NaN NaN ... NaN NaN NaN 5214.743810 00:38:54.840000 5 00:06:02.147058 7.23 6.400000 6.035784
2021-03-21 09:47:35.279 00:47:52.721000 8.391221 12.847088 00:07:09 00:04:40 8.892325 00:06:44 NaN NaN NaN ... NaN NaN NaN 7127.228493 00:50:57.722000 1 00:06:02.147058 7.23 6.733333 6.035784
2021-03-28 09:31:37.000 00:55:37.999000 8.365421 20.043178 00:07:10 00:02:59 9.415942 00:06:22 NaN NaN NaN ... NaN NaN NaN 8862.698326 01:03:34 6 00:06:02.147058 7.23 6.366667 6.035784
2021-04-02 09:39:34.000 00:30:31 10.005436 37.905503 00:05:59 00:01:34 10.081349 00:05:57 NaN NaN NaN ... NaN NaN NaN 5138.905933 00:30:49.001000 5 00:06:02.147058 7.23 5.950000 6.035784
2021-04-04 09:16:15.798 01:03:41.203000 9.525359 352.418093 00:06:17 00:00:10 9.939046 00:06:02 NaN NaN NaN ... NaN NaN NaN 10623.961318 01:06:55.204000 1 00:06:02.147058 7.23 6.033333 6.035784
2021-04-09 09:39:12.563 00:19:11.437000 10.108209 33.004691 00:05:56 00:01:49 10.108209 00:05:56 NaN NaN NaN ... NaN NaN NaN 3233.046143 00:19:11.437000 5 00:06:02.147058 7.23 5.933333 6.035784
2021-04-11 09:36:44.033 01:03:52.970000 8.931733 62.591395 00:06:43 00:00:57 9.801456 00:06:07 NaN NaN NaN ... NaN NaN NaN 10544.330137 01:10:49.969000 1 00:06:02.147058 7.23 6.116667 6.035784
2021-04-21 09:05:33.139 00:36:30.861000 10.122163 20.957374 00:05:55 00:02:51 10.122163 00:05:55 NaN NaN NaN ... NaN NaN NaN 6160.070149 00:36:30.861000 9 00:06:02.147058 7.23 5.916667 6.035784
2021-04-25 09:38:37.511 01:15:46.633000 9.651477 49.152245 00:06:13 00:01:13 10.000173 00:05:59 NaN NaN NaN ... NaN NaN NaN 12719.815370 01:19:04.490000 4 00:06:02.147058 7.23 5.983333 6.035784
2021-05-02 09:39:39.002 00:28:44.998000 9.902854 13.701415 00:06:03 00:04:22 10.494083 00:05:43 NaN NaN NaN ... NaN NaN NaN 5050.450029 00:30:35.998000 7 00:06:02.147058 7.23 5.716667 6.035784
2021-05-21 09:17:20.000 00:25:12 9.362450 17.238005 00:06:24 00:03:28 9.782870 00:06:07 NaN NaN NaN ... NaN NaN NaN 4142.884173 00:26:33 18 00:06:02.147058 7.23 6.116667 6.035784
2021-05-28 19:37:03.392 00:43:42.408000 9.546533 32.766574 00:06:17 00:01:49 9.997542 00:06:00 NaN NaN NaN ... NaN NaN NaN 7349.790670 00:46:11.608000 7 00:06:02.147058 7.23 6.000000 6.035784
2021-05-30 09:58:12.455 01:02:05.546000 9.230633 26.180357 00:06:30 00:02:17 9.762058 00:06:08 NaN NaN NaN ... NaN NaN NaN 10147.404038 01:05:57.546000 1 00:06:02.147058 7.23 6.133333 6.035784
2021-06-06 09:33:11.000 00:50:08.002000 8.863385 24.451184 00:06:46 00:02:27 9.663379 00:06:12 NaN NaN NaN ... NaN NaN NaN 8196.171447 00:55:29.001000 6 00:06:02.147058 7.23 6.200000 6.035784
2021-06-11 09:30:44.000 00:25:50 9.470173 21.739674 00:06:20 00:02:45 9.694837 00:06:11 NaN NaN NaN ... NaN NaN NaN 4206.337676 00:26:39.001000 4 00:06:02.147058 7.23 6.183333 6.035784
2021-06-13 09:22:30.985 01:32:33.018000 9.406340 84.902240 00:06:22 00:00:42 10.119079 00:05:55 NaN NaN NaN ... NaN NaN NaN 15706.017295 01:40:11.016000 1 00:06:02.147058 7.23 5.916667 6.035784
2021-06-20 09:16:55.163 00:59:44.512000 8.973504 21.837222 00:06:41 00:02:44 9.898030 00:06:03 NaN NaN NaN ... NaN NaN NaN 9965.168311 01:06:37.837000 6 00:06:02.147058 7.23 6.050000 6.035784
2021-06-23 09:37:44.000 00:26:49.001000 9.006464 20.308834 00:06:39 00:02:57 9.248208 00:06:29 NaN NaN NaN ... NaN NaN NaN 4165.492241 00:27:45.001000 3 00:06:02.147058 7.23 6.483333 6.035784
2021-06-27 09:50:08.664 00:31:42.336000 9.527376 117.842846 00:06:17 00:00:30 9.582671 00:06:15 NaN NaN NaN ... NaN NaN NaN 5074.217061 00:31:57.336000 4 00:06:02.147058 7.23 6.250000 6.035784
2021-07-04 11:23:19.418 00:47:47.583000 9.368145 15.164352 00:06:24 00:03:57 10.284485 00:05:50 NaN NaN NaN ... NaN NaN NaN 8248.084577 00:52:49.582000 7 00:06:02.147058 7.23 5.833333 6.035784

31 rows × 23 columns

[24]:
plt.subplots(figsize=(8, 5))

plt.scatter(summary_without_outliers_2021.total_distance/ 1000, summary_without_outliers_2021.mean_moving_pace_float, color='purple', marker='s')
plt.title("Distance vs. pace")
plt.xlabel("Distance")
plt.ylabel("Pace")
plt.legend()
No handles with labels found to put in legend.
[24]:
<matplotlib.legend.Legend at 0x7fa8dad6b0d0>
../_images/user_guide_3-session_44_2.svg

It still does not give us a clear trend, but it depicts that she really started to achieve better performance when she started to run longer distances (paces loweer then 6’20 when she started to run above 10km).

Finally, let’s see her performance evolution in pace over time in 5km, 10km and 15kms. For this analysis, I had to filter the session dataframe based on distances. In this tutorial, I will assume that the distances between 5km - 5.9km , 10km - 10.9km, 15km - 15.9km will be normalized as 5, 10 , 15km.

[54]:
summary_without_outliers_5km = summary_without_outliers[summary_without_outliers['total_distance'].between(5000,5900)]
summary_without_outliers_10km = summary_without_outliers[summary_without_outliers['total_distance'].between(10000,10900)]
summary_without_outliers_15km = summary_without_outliers[summary_without_outliers['total_distance'].between(15000,15900)]
[61]:
fig, axs = plt.subplots(3, sharex=True, figsize=(18, 16))
fig.suptitle('Average Moving Pace over time (5km, 10km, 15km)')

axs[0].plot(summary_without_outliers_5km.index, summary_without_outliers_5km.mean_moving_pace_float, marker='*')
axs[0].set_title('5km')

axs[1].plot(summary_without_outliers_10km.index, summary_without_outliers_10km.mean_moving_pace_float, marker='*')
axs[1].set_title('10km')


axs[2].plot(summary_without_outliers_15km.index, summary_without_outliers_15km.mean_moving_pace_float, marker='*')
axs[2].set_title('15km')

plt.xlabel('Date')

plt.show()
../_images/user_guide_3-session_48_0.svg

Her 5km got better until semptember 2020, but as she started to get greater distances, her performance got more slower. He 10km got better across 2021. Her 15km, since we only have one observation, there’s no insight/trend there.

Conclusions#

  • So, as expected, no major insight. However, always running help he to improve her performance, in order to maintain or lower her pace.

  • She started to run greater distances with excelent paces, for instance the distance run of 15km with the moving pace 5’50.

  • There is no trend about her run distance vs pace. We believe that we need to collect more data to might see any new insights.

In this tutorial, we showed the possibilities of using runpandas python package to perform several type of running analysis assisted by visualization and data handling packages such as Matplotlib and pandas. With the introduction of session feature, we now can analyse a group of activities and investigate new insights over time.