Interactive Web-Based Visualization for Lake Monitoring in Community-Based Participatory Research : A Pilot Study Using a Commercial Vessel to Monitor Lake Nipissing

Environmental and limnological monitoring is of interest to government agencies, researchers, and the general public. In communities that rely on and are heavily affected by lakes and their watersheds, accessible and intuitive presentation of lake properties influences and aids decision-making, interventions, and the formulation of environmentally sound policies. In this paper, interactive web-based visualizations are employed as a mechanism to communicate environmental information collected from a commercial cruise vessel. A pilot study is presented for monitoring Lake Nipissing, a large culturally and environmentally important lake in northeastern Ontario, Canada. This example of community-based participatory research suggests that: (1) policy makers and researchers can quickly gain insight into what is happening in the lake through visualizations, which helps to direct subsequent, detailed investigations; and (2) through accessible, visual presentation, community members may be encouraged to become involved in contributing to environmental policies that directly affect them, thereby supporting environmental “citizen science”.


Problem
Community-based participatory research (CBPR), including "citizen science", has gained increasing prominence as a means of addressing challenges in data collection and transparency (Bonney et al., 2009;Newman et al., 2010).CBPR has been defined as research undertaken as a partnership between community members, academic researchers, and other organizations (Israel, Schulz, Parker, & Becker, 1998), whereas "citizen science" primarily refers to avocational research, where research goals are partly obtained by distribution of tasks to non-professionals or amateurs (e.g. the SETI@home or FOLDIT@home initiatives) (Hand, 2010).Both terms imply multiple stakeholders with differing goals and expectations.Furthermore, the research questions themselves can be posed through top-down (scientist-driven) or bottom-up (community-driven) processes (Danielsen et al., 2009).In addition to providing an increased amount of data (Crall et al., 2010;Kéry et al., 2010), stakeholder data collection can better reflect human-environment interactions (Kanjo & Landshoff, 2007;Weckel, Mack, Nagy, Christie, & Wincorn, 2010).Research involving stakeholders can also lead to increased public awareness of environmental issues (Kanjo & Landshoff, 2007) and increased participation in further research (Sullivan et al., 2009).However, the published literature largely focuses on the role that non-scientists can play within the collection of mostly quantitative data within limited parameters [but see (Viegas, Wattenberg, Van Ham, Kriss, & McKeon, 2007) for an example of unstructured collection of qualitative data].The inclusion of qualitative data collection by non-scientists can also provide a better understanding of complex environmental processes, as well as the perceptions of stakeholders (Abbott & Campbell, 2009;St. Martin, 2001).Moreover, the literature is largely silent about how participatory science can improve policy dialogue beyond improving data quality and transparency.
& Gaddis, 2008) and the impact of environmental factors on public health (Eghbalnia et al., 2013;Israel et al., 2005;Miller et al., 2013) are prominent.In particular, environmental monitoring is complex, as it relies on multimodal data from sensors, images, and empirical observation.Furthermore, the large quantities of (often multi-dimensional) data acquired from various sensors motivate the need for accessible, interactive visualizations.
The current work focuses on CBPR research and visualization for lake monitoring, where data are collected using a sonde sensor affixed to a recreational cruise vessel, the Chief Commanda II.Although the concepts presented in this paper apply to aquatic monitoring in general, the specific target of this research, presented as a pilot study, is Lake Nipissing, the third-largest lake in Ontario, Canada.The lake, which drains into the Georgian Bay of Lake Huron, provides important economic, cultural and ecosystem services to residents and visitors.There is growing concern about declining fisheries, increased blue-green algae blooms, and the appearance of invasive aquatic species (Morgan, 2013;Hutchinson, Karst-Riddoch, & Köster, 2010;Filion, 2011).Because of its importance, the lake is the subject of several ongoing scientific studies of its biophysical properties.These studies involve measuring several parameters in different sections of the lake, generating large quantities of data that must be processed and analyzed.Applying the results of visualization and human-computer interaction studies, these data are subsequently presented through interactive web-based visualizations.The emphasis of the current work is therefore transparent data presentation and interaction.
Major stakeholders include environmental researchers, government policy-makers, First Nations communities, businesses -particularly the fisheries industry -and the general public in North Bay (pop.54 000) and other communities that surround Lake Nipissing.Researchers, for instance, need to establish baseline environmental indicators, such as oxygen level and temperature.The public can gain an understanding of the data being collected and why they are collected, and how they may contribute potentially important qualitative data, such as unusual environmental events that may be overlooked by conventional surveys.Policy-makers are concerned about fish populations, particularly walleye (Morgan, 2013), as well as the potential effect of three dams on the French River, the primary outflow of the lake.
This research emphasizes the need to enhance data collection, to expand stakeholder participation in the research and policy process, and to disseminate results to the public through intuitive, interactive visualizations.To achieve these goals, an online platform was developed wherein sensor data can be displayed after acquisition.In turn, environmental data can be augmented by qualitative and quantitative input from other stakeholders.This two-way flow of information via the online platform serves several purposes: it allows stakeholders, on a real-time basis, to review the data, and to understand why they are collected; it complements standard data collection by providing additional data input; and it establishes common ground between scientists, policy makers and other stakeholders.The current study encompasses both CBPR, in that data are collected as a collaborative effort from the crew of a commercial cruise vessel, and citizen science, where not only government policy makers and academic researchers can access the data, but where all community members can make use of interactive visualizations and draw conclusions, or relate the information to personal experience.

The Importance of Lake Monitoring
Many large lakes are of vital significance to local communities, and convey their own important benefits and present their own unique challenges and problems.In the case of Lake Nipissing, the lake experiences considerable variability in biological productivity both within and between years; concerns have been raised regarding apparent declines in fish population and more frequent blue-green algae blooms (Morgan, 2013;Hutchinson et al., 2010).While this may be due in part to the inherent variability of the lake's ecosystem, growing levels of human activity within the region, as well as broader changes in water level and temperature associated with climate change, are also possible factors.In some cases, Lake Nipissing has been affected by human activity, such as modification of tributaries, eutrophication, and the introduction of invasive species (Filion, 2011).

The Importance of Interactive Visualization in Environmental Monitoring and Citizen Science
Accessible, web-based visualization technologies can stimulate scientific innovation for both researchers and members of the community who participate in the research and in environmental monitoring.Such visualizations encourage non-scientists to develop new questions, and scientists to address questions in new ways, given the availability of citizen participation (Goodchild, 2007).Specifically, interactive browser-based visualization and analysis allow participants and scientists to explore the data with fewer constraints, thereby enhancing innovation (Newman et al., 2012).For instance, visualization, visual communication tools, and immersive virtual experiences are encouraging community involvement in sustainability efforts aimed at assessing climate change effects (e.g.http://cirs.ubc.ca/research/research-area/visualization-tools-and-community-engagement).
Such highly interactive visualization techniques are increasingly used in geographical visualization (Dykes, MacEachren, & Kraak, 2005).Utilizing a variety of interactive methods allows data to be explored more thoroughly, thereby aiding understanding.For instance, multiple graphs and maps showing related data can be linked together such that an operation performed on one graph or map will be reflected on the others, creating a dynamic connection between the information and the visualizations (Theus, 2005).Especially for representation of geospatial data, maps and other visualizations often require simplification so that a high level understanding of fundamental concepts (e.g.datums, projection, resolution) is not necessary for participants to successfully use the site (Newman et al., 2010).This principle was employed in the present study, where the data collected from the Chief Commanda II are presented using multiple visualizations linked through an interactive map.Users can interact with a variety of different visualizations (described below) in such a way that they can explore different spatial, temporal, properties and other aspects using multiple visualization mechanisms.Such a "linked approach" is conducive to understanding and successful use by people who are not trained in geographic information systems or in spatial representation methods (Newman et al., 2010).

Interactive Visualization of Multidimensional Transect Data
Data were collected at regularly-sampled intervals, usually every minute, every three minutes, or every four minutes depending on the transect (a Chief Commanda II cruise where data were collected), and can therefore be considered as continuous.As cruise durations are generally less than three hours, and there may be hours or even days between each transect, the temporal dimension is described as discrete.Finally, the spatial coverage of each transect is described as continuous because the GPS collects positions at regular intervals.Due to the hybrid nature of the data, alternative graphing approaches are used: (1) An interactive overlay map for visualizing properties with high spatial resolution; (2) Parallel coordinate plots for communicating high dimensionality (i.e., the different properties collected by the sensors attached to the cruise vessel), providing both high temporal and spatial resolution; (3) Cloud lines, which are new, sophisticated scatter plots for displaying long-term trends in properties over time, allowing users to probe how properties change as a function of distance from a specific reference point; (4) Hovmöller (space-time) plots that provide a broad representation of changes in properties over time and space, targeted primarily to the research community and environmental organizations; and (5) Visualization of descriptive statistics.

Interactive Overlay Map
Easily understood maps can be overlaid with data to greatly aid understanding and ease of interpretation (Hochachka et al., 2012).The primary component of the linked visualizations is a map of Lake Nipissing overlaid with paths representing each transect taken by the Chief Commanda II.The cruise paths are by default coloured such that the colour at any particular coordinate reflects the value for a chosen attribute at that point along the path.The data can be coloured based on any chosen property, using one of several different colour maps.These include specialized colour maps, such as a greyscale colour map, colour maps for users with colour-deficient vision, and the standard pH scale, which can be adjusted to show small differences which normally occur in lake monitoring (e.g.setting 5 to to red/acidic, 8 to purple/alkaline).Groups of points can be selected on the map, and these spatial selections will be reflected throughout the rest of the visualization.

Parallel Coordinate Plots
Another facet of the linked visualization is a parallel coordinates plot (PCP), where n parallel axes, each corresponding to a dimension of the data, are drawn.A single n-dimensional tuple is represented by a polyline (a piecewise connected line) that intersects each axis at the location corresponding to the tuple's value for that axis' dimension (Streit et al., 2006), allowing all dimensions of multidimensional data to be displayed simultaneously.The PCP used in the linked visualization assigns a parallel axis to each of the measurements collected in the study.PCP can be used for an arbitrary number of dimensions.It can also indicate when data fall outside the range of the sensor by showing these values below their corresponding coordinate axis (Siirtola & Räihäm, 2006).
Many enhancements have been made to the standard parallel coordinates technique.In the current study, a 3D plot was implemented so that every parallel axis was extended along the z-axis into a plane.The different Chief Commanda II transects are separated along the z-axis, clearly indicating the date of each collection (Johansson et al., 2014).Opacity controls were provided to reveal the density distribution of the graph, as polylines in dense areas are more visible than in less dense areas (Wegman & Luo, 1996).The order of the parallel axes can be changed to better observe relationships between two arbitrarily juxtaposed axes (Blaas et al., 2008).
The polylines can also be grouped into k categories (Johansson et al., 2005), with each polyline coloured based on the cluster to which it belongs.The clustering can be performed based on any combination of m properties, using some distance measure between the m-D data points (in the current study, Euclidean distances).Although many sophisticated clustering algorithms exist, the standard iterative k-means clustering approach was used in this work.
Groups of polylines can be selected directly on the plot by "brushing" (Edsall, 2002), wherein users draw a "brush line" over the plot to highlight polylines.Consistent with the goal of linked visualization, the selection ("brushed" polylines) is reflected throughout the other visualization components (i.e. the map and the statistics table ).The polylines in the PCP are also coloured based on the property chosen for the map, with each polyline's colour matching that of its corresponding point on the map.

Cloud Lines
To represent time, a discrete measure due to the temporal distance between transects, a visualization based on cloud lines was employed (Krstajic et al., 2011).Cloud lines are essentially scatter plots where dependent variables (sonde properties) are each shown as single lines, as a function of the independent variable (time).Events (the sonde readings) along the time axis are represented as circular markers that are coloured according to their value, or whose radius and/or opacity reflects the value, resulting in lines of varying width and colour.The advantage of this technique is that it allows a large amount of data to be intuitively displayed with a relatively small amount of screen space.Initially proposed for discrete, episodic data (Krstajic et al., 2011), cloud lines were adapted to continuous data in the current study to convey properties represented in a straight line, whose colours vary over time according to their value.To represent the spatial dimension, the radius of each point in the cloud line represents the proximity (inverse distance) of that spatial coordinate from a user-specified point, where readings closer to the selected point appear larger.Cloud line plots for multiple properties can be shown simultaneously, and can be juxtaposed arbitrarily using a dragging mechanism.

Space-Time Plots
Hovmöller (space-time plots) present a standard image of how properties change over space (y-axis) and time (x-axis).The image colour indicates the value of the property currently being displayed (Buzzelli, Ramus, & Paerl, 2003).To make these diagrams more useful, space is represented as distance from a user-selected point on the map, allowing users to view changing properties based on specific geographic locations.
The time sparsity of the transects in this pilot study posed challenges for space-time interpolation, as the data are sparse in time but not in space (Wright & Goodchild, 1997).Therefore, the natural neighbour method based on Voronoi methods was employed (Gold & Condal, 1995;Wright & Goodchild, 1997).

Statistics
Two statistics tables and corresponding graphs are also available.The statistics tables show the mean, standard deviation, median, interquartile range, minimum, and maximum values of every property.The first table shows these statistics for the data points that are currently selected in the linked visualizations, and is updated dynamically whenever the selection changes on either the interactive map or the PCP.The second table displays statistics on a transect-by-transect basis, and is not dynamic.The available graphs are a multi-bar graph of the mean ± standard deviation for all selected properties, and a box-whisker plot of the minimum, first quartile, median, third quartile, and maximum for a selected property.

Colour Maps
For the coloured properties described in 1.4.1 to 1.4.4,the standard rainbow (blue = low, red = high) and grayscale (black = low, off-white = high) colour maps that vary colour across the visual spectrum are included, but the former are sometimes considered to be among the least useful.Recent visualization research has focused on diverging colour maps, where a transition between two chosen colours passes through an unsaturated colour (Moreland, 2009).Several of these diverging colour maps are implemented in the current system.These colour maps may also be more easily understood by users with colour-deficient vision (Moreland, 2009).

Study Area
The study area covered a large NE to SW sector of Lake Nipissing (see the map on Figure 1).Most data were collected in an area within approximately 20 km of North Bay, Ontario, but many routes also included the environmentally important Callander Bay area in the southeastern part of the lake.A limited amount of data is also available for the French River, which is the primary outflow for the lake, located in its southwestern sector.then updated on the PCP, cloud lines, and statistics visualization.The user may also select individual transects for analysis.Although transects generally run less than three hours, a time slider is provided to allow users to track the progress of a single selected transect, and to assess how properties change during this transect.For the cloud lines, an interactive range selector (adapted from Dygraphs) located below the graphs allows zooming in on smaller date ranges, and hovering over a point on the graph displays the value of all properties at that point in time.
Due to the computational complexity of computing the Hovmöller plots and the limited system resources generally available for web-based visualization, Hovmöller plots were pre-computed for thirty-nine points close to meaningful observation sites across the lake.The site closest to the user-selected point was used for display.Up to four diagrams can be displayed simultaneously.

Human-Computer Interaction
To facilitate the community-based research aspect of this work, the web-based interface was designed to be as intuitive and interactive as possible.The diverse visualizations are linked through the interactive map "hub".This map allows users to select specific transects (either by clicking them on the display or by selecting them from a drop-down list), dates, and geographic locations.
The interface is divided into three components to reduce scrolling.Each component can be minimized to save screen space.The interactive map, colour map selection, and transect selection are located in separate panels in the top component.The second component houses the PCP, cloud lines, and space-time plots.A single visualization can be displayed in this panel, allowing the user to rapidly cycle through the visualization with tabs, resulting in a cleaner, easier-to-navigate interface.Users, especially non-researchers, may find it easier to concentrate on one representation at a time.Options that are unique to each visualization are displayed on the right side of the panel.
The bottom component features the statistical tables and plots.A tab interface separates statistics over all data collections with per transect representations.Finally, although the interface, visualization, and components were designed to be intuitive and self-teaching, "tool tips" (text that explains the function of a button, menu, or feature that appears when the pointing device/mouse hovers over it) and text in the interface are liberally employed to aid navigation.Because this is a citizen science initiative, a tutorial video is included that provides an overview of all of the features of the service with special attention given to the more advanced visualizations.

Results
The website is hosted on Nipissing University's server system, and is found at the URL http://visual.nipissingu.ca/Commanda2.All the features previously described in Section 2 can be found on this site (Figure 1).Three examples illustrating the use of the visualizations for exploratory data analysis follow.

Temperature and Dissolved Oxygen
Using the North Bay municipal dock as the reference point from which spatial distances were calculated, a cursory analysis of the Hovmöller plots for temperature and DO may lead to the interpretation of an inverse relationship between the two properties (Figure 2).Because of the low temporal resolution of these plots, the relationship is further examined with cloud lines visualizations, using the same reference point (Figure 3).Here, four properties of interest (temperature, DO concentration, specific conductivity, and pH) are displayed, with the temperature and DO concentration cloud lines moved to the top to facilitate interpretation.The larger radii denote closer proximity to the reference location.The apparent inverse relationship seen in the Hovmöller plots was also observed in the cloud lines, with possible hyponoxic (or even anoxic) conditions in parts of the lake.
However, the PCP plot shows that some polylines fall below the DO coordinate axis; consequently, those DO values are suspect (Siirtola & Räihäm, 2006).On the specific sonde used in this study, when the DO charge falls below 25, DO readings are suspect.When the cloud lines zoomed to show the last five transects (all with valid DO values), a positive relationship is seen on the August 22, 2011 transect, and a negative relationship on the September 4, 2011 trip.The visualization led to a closer statistical analysis being performed, the results of which are shown in Table 1.The correlation coefficients (r) confirm the visual results.
To take the analysis further, the PCP was clustered based on temperature and DO concentration with five clusters (Figure 4).Not only is an inverse relationship generally observed between these properties, but upon further examination with rotating the PCP in 3D and by brushing the time period covering the three clusters, it was seen that three of the five clusters were generated for the longest cruise towards the distant French River, suggesting wide spatial variability in DO concentration.By reducing the number of clusters to two, the French River cruise was clearly separated from the other cruises.features to be ies except date ith all data fro data from late A and the data fr ar conditions, i t of 2011.How

Discuss
From

Conclusion
The data collected for the present study are an important first step in gathering baseline data about Lake Nipissing's environment.Baseline data are especially crucial given the variability of the lake ecosystem.
Further investigation with the visualizations also shows that this variability was particularly apparent in 2010, when lake levels were several meters lower than average.As such, it was expected that the recorded temperature and oxygen values also depart from the norm.However, historical data with which to compare them is not available.
This pilot project indicates that remote monitoring of aquatic parameters in Lake Nipissing is feasible.The method will be adapted to allow the YSI 6600 V2 sonde readings and spatial data to be integrated on a data logger, which will simplify the data collection procedure and make detection of anomalous readings easier and quicker.
The community-based research pilot project presented in this paper is part of a larger initiative for monitoring Lake Nipissing, which presently includes collecting data from sensors at different depths on buoys placed at various locations around the lake.It is also anticipated that the visualizations presented in this web service will be of interest to other community, government, and academic organizations that focus on lake monitoring, such as the Global Lake Ecological Observatory Network (www.gleon.org).
As the applications for real-time aquatic monitoring expand, the potential of using multi-parameter data to better understand and anticipate adverse events in Lake Nipissing -such as anoxia or blue-green algae blooms -is also seen.For example, recent work has demonstrated how evolutionary algorithms can be used to model and forecast algal blooms in a dam reservoir in China (Ye, Cai, Zhang, & Tan, 2014).The visualization of these analyses could potentially aid in identifying correlations and increasing model accuracy.
With the rapid collection and analysis of data, and the implementation of a publicly-available web service, it is hoped that the flow of information will be two-way, where interested parties can comment on the data, as well as contribute their own, including location-tagged images, thereby fulfilling the mandate of citizen science and community-based participatory research.