Analyzing and Visualizing your Water Quality Data
7 min read

Analyzing and Visualizing your Water Quality Data

This is where the fun begins. Now that you’ve collected, cleaned and organized your data, your ready begin analyzing and visualizing. Unlike the previous and more tedious steps in our analysis process, visualization creation is both science and art. When done right, it is the perfect blend of statistical mastery, technical prowess, and sophisticated design. During this step we turn our eye towards communicating the essential data takeaways in the form of informative, intuitive, and attractive visualizations.

One of many workflow styles for producing data-driven deliverables. Created by The Commons.

Stick to the Stats

During your data QA/QC process you should have been recording interesting statistics and trends. These observations are what make for exciting and impactful visualizations. A flat line isn’t worth graphing, it’s the outliers, patterns, and exceptions to the norm that lend themselves best to interesting graphics. This is the time to dig deeper into these values and start to bring them to life through charts.

Using the notable statistics as a road map, start making a smattering of simple charts. At this stage we aren’t looking for the next award winning visualization, if anything we want to err on the side of quantity over quality. The idea is to play with the data. I find the easiest way to do so is to pull my data into Excel or Google Sheets.

First pass at chart making. Created by The Commons.

With the data in my spreadsheet program of choice, I start spitting out pivot tables, bar charts, line charts, and scatter charts. The focus is not on perfection but on exploration. However, this process can get really messy fast. Try to be methodical and use your documentation framework you developed earlier. I like to leave comments in my sheets and move completed charts onto their own tab to keep things neat.

Once you are satisfied with the number of charts and you have reached knowledge saturation, write a brief narrative of your data. It should generally follow an ‘A effects B’ format, i.e., “precipitation causes agricultural runoff, leading to nitrogen spikes and algae blooms.” or, “warm water from urban flash floods reduce dissolved oxygen levels and cause fish die offs.” It might seem trivial, but writing out these relationships in simple language will help you stay focused on the essential information that needs communicating.

Good Design

Example of publishable chart from the Common’s SSO article.

For example, while researching my SSO story, I focused on one precipitation event in Baltimore during the summer of 2018 which caused sanitary sewage overflows (SSOs) and subsequent increase in bacteria levels. I had seen a spike in the data earlier while performing QA/QC and made a few charts out of it. See how the chart for precipitation is incredibly simple. Only a handful of data points, two colors, and two variables — time and precipitation. However, the overall design is still visually pleasing with generous framing, readable text, and satisfying asymmetry. Simple charts like these are not only easier to make, but are often better at conveying information than multivariable graphics with complex design elements.

Tons of Tools

As visualization creators we are blessed with a plethora visualization tools and packages ranging from the simple to the sophisticated. Try to not get caught up in which platform to use — they are all pretty darn good. The important part is sticking to established design principles. Frankly Google Sheets and Excel can get you pretty far, but if your looking to level up your toolset, there is a plethora of free software solutions. Take a look at Google Data Studio for creating scorecards and interactive visuals. For command line chart creation, R and it’s many visualization packages are my go to. Flourish and Datawrapper are solid web based systems that can produce impressive results. While you descend deep into the visualization rabbit hole, always remember the end goal of our work is to make the unknown known — to shine a graphical light into a shadowy statistical corner of our data. Have fun playing with these tools and stylizing your charts, but the data is what’s important. Here are some specific suggestions and design principles for creating visually exciting graphics that highlight rather than burry the data.

Rules to Follow

Reduce Chartjunk. A stuffy yet revered scholar by the name of Edward Tufte defined the term chartjunk in his 1983 book, A Visual Display of Quantitative Information. In it he writes:

“The interior decoration of graphics generates a lot of ink that does not tell the viewer anything new. The purpose of decoration varies — to make the graphic appear more scientific and precise, to enliven the display, to give the designer an opportunity to exercise artistic skills. Regardless of its cause, it is all non-data-ink or redundant data-ink, and it is often chartjunk.”
SAS’s grotesque chart. SAS.

Notice the nauseating use of hatching and gridlines in this chart from SAS (a statistical programming company). A good method to reduce chartjunk is to examine each graphical element and ask yourself “Does my graphic convey the same message without this element?” If so, remove it.

Y vs. X. Created by The Commons.

Use Color Wisely. Humans are really good at associating color with feeling and information. Red means danger, green means good, and yellow can serve to highlight as well as indicate caution. Use colors to your advantage. Highlight an important datapoint in a light red to quickly draw a reader’s attention. Use light grey to demote necessary, but less important contextual information like trendlines. Using a color palette is a great way to hold your readers attention without jarring them with disconcerning contrasts and difficult to read colors.

Write (or speak) to the Data. Most graphics are bundled into some form of deliverable, whether it be a report, presentation, or article. When possible, directly refer to your important chart elements in your writing. This can simply be “notice the spike between X-Y”. Steer your audience toward your graphic and leverage the visualization to better support your argument. The power of your written or spoken word is amplified by your graphic and visa versa. In doing so you’ll create a more cohesive and dynamic final product that will inevitably increase comprehension.

Note this list is far from exhaustive and there are innumerable resources for visualization design principles. This is a good starting point.

Tough Decisions

When it came to visualizing my SSO data across Baltimore City, I faced quite a few challenges. My original idea was to display the data in a large interactive map, but this proved technically challenging for the scope of the article and graphically busy. I decide to break the narrative into smaller temporal chunks, focusing on a particular event using simple graphs, while demonstrating the cumulative impact of SSOs overtime using a GIF made in QGIS with the TimeManager plugin.This way my audience could digest the information over the length of the story with the supportive of a written narrative, rather than being overwhelmed by an imposing interactive element.

At the end of my article, I provided a combination of all my variables in an interactive that allowed users to control the time scale and different variables at the same time. The goal was to set the audience loose to explore the patterns only after they had understood its basic components. This design philosophy focused on communicating a data narrative, rather than flashy design elements — albeit fun to make.

The trade of data visualization and analysis is constantly evolving. It’s easy to get caught up in design, chart choice, and interactive elements. Although these components can dramatically increase comprehension and readership, they can also distract from the point you are trying to convey. When starting out, stick to the basics. First, master your data analysis skills to identify interesting relationships, then unleash your artistic talents to intelligently highlight these relationships, persuade your constituatiants, and make change.

Tags
Gabriel Watson
Data Analyst

Gabe leads the Common Knowledge program at The Commons and develops narrative and analysis supporting environmental and social causes. Hailing from Baltimore Maryland, Gabe spent his undergrad studying economics and urban environmental policy at Occidental College in northeast Los Angeles.

Analyzing and Visualizing your Water Quality Data
7 min read

Analyzing and Visualizing your Water Quality Data

Technology
Feb 24
/
7 min read

This is where the fun begins. Now that you’ve collected, cleaned and organized your data, your ready begin analyzing and visualizing. Unlike the previous and more tedious steps in our analysis process, visualization creation is both science and art. When done right, it is the perfect blend of statistical mastery, technical prowess, and sophisticated design. During this step we turn our eye towards communicating the essential data takeaways in the form of informative, intuitive, and attractive visualizations.

One of many workflow styles for producing data-driven deliverables. Created by The Commons.

Stick to the Stats

During your data QA/QC process you should have been recording interesting statistics and trends. These observations are what make for exciting and impactful visualizations. A flat line isn’t worth graphing, it’s the outliers, patterns, and exceptions to the norm that lend themselves best to interesting graphics. This is the time to dig deeper into these values and start to bring them to life through charts.

Using the notable statistics as a road map, start making a smattering of simple charts. At this stage we aren’t looking for the next award winning visualization, if anything we want to err on the side of quantity over quality. The idea is to play with the data. I find the easiest way to do so is to pull my data into Excel or Google Sheets.

First pass at chart making. Created by The Commons.

With the data in my spreadsheet program of choice, I start spitting out pivot tables, bar charts, line charts, and scatter charts. The focus is not on perfection but on exploration. However, this process can get really messy fast. Try to be methodical and use your documentation framework you developed earlier. I like to leave comments in my sheets and move completed charts onto their own tab to keep things neat.

Once you are satisfied with the number of charts and you have reached knowledge saturation, write a brief narrative of your data. It should generally follow an ‘A effects B’ format, i.e., “precipitation causes agricultural runoff, leading to nitrogen spikes and algae blooms.” or, “warm water from urban flash floods reduce dissolved oxygen levels and cause fish die offs.” It might seem trivial, but writing out these relationships in simple language will help you stay focused on the essential information that needs communicating.

Good Design

Example of publishable chart from the Common’s SSO article.

For example, while researching my SSO story, I focused on one precipitation event in Baltimore during the summer of 2018 which caused sanitary sewage overflows (SSOs) and subsequent increase in bacteria levels. I had seen a spike in the data earlier while performing QA/QC and made a few charts out of it. See how the chart for precipitation is incredibly simple. Only a handful of data points, two colors, and two variables — time and precipitation. However, the overall design is still visually pleasing with generous framing, readable text, and satisfying asymmetry. Simple charts like these are not only easier to make, but are often better at conveying information than multivariable graphics with complex design elements.

Tons of Tools

As visualization creators we are blessed with a plethora visualization tools and packages ranging from the simple to the sophisticated. Try to not get caught up in which platform to use — they are all pretty darn good. The important part is sticking to established design principles. Frankly Google Sheets and Excel can get you pretty far, but if your looking to level up your toolset, there is a plethora of free software solutions. Take a look at Google Data Studio for creating scorecards and interactive visuals. For command line chart creation, R and it’s many visualization packages are my go to. Flourish and Datawrapper are solid web based systems that can produce impressive results. While you descend deep into the visualization rabbit hole, always remember the end goal of our work is to make the unknown known — to shine a graphical light into a shadowy statistical corner of our data. Have fun playing with these tools and stylizing your charts, but the data is what’s important. Here are some specific suggestions and design principles for creating visually exciting graphics that highlight rather than burry the data.

Rules to Follow

Reduce Chartjunk. A stuffy yet revered scholar by the name of Edward Tufte defined the term chartjunk in his 1983 book, A Visual Display of Quantitative Information. In it he writes:

“The interior decoration of graphics generates a lot of ink that does not tell the viewer anything new. The purpose of decoration varies — to make the graphic appear more scientific and precise, to enliven the display, to give the designer an opportunity to exercise artistic skills. Regardless of its cause, it is all non-data-ink or redundant data-ink, and it is often chartjunk.”
SAS’s grotesque chart. SAS.

Notice the nauseating use of hatching and gridlines in this chart from SAS (a statistical programming company). A good method to reduce chartjunk is to examine each graphical element and ask yourself “Does my graphic convey the same message without this element?” If so, remove it.

Y vs. X. Created by The Commons.

Use Color Wisely. Humans are really good at associating color with feeling and information. Red means danger, green means good, and yellow can serve to highlight as well as indicate caution. Use colors to your advantage. Highlight an important datapoint in a light red to quickly draw a reader’s attention. Use light grey to demote necessary, but less important contextual information like trendlines. Using a color palette is a great way to hold your readers attention without jarring them with disconcerning contrasts and difficult to read colors.

Write (or speak) to the Data. Most graphics are bundled into some form of deliverable, whether it be a report, presentation, or article. When possible, directly refer to your important chart elements in your writing. This can simply be “notice the spike between X-Y”. Steer your audience toward your graphic and leverage the visualization to better support your argument. The power of your written or spoken word is amplified by your graphic and visa versa. In doing so you’ll create a more cohesive and dynamic final product that will inevitably increase comprehension.

Note this list is far from exhaustive and there are innumerable resources for visualization design principles. This is a good starting point.

Tough Decisions

When it came to visualizing my SSO data across Baltimore City, I faced quite a few challenges. My original idea was to display the data in a large interactive map, but this proved technically challenging for the scope of the article and graphically busy. I decide to break the narrative into smaller temporal chunks, focusing on a particular event using simple graphs, while demonstrating the cumulative impact of SSOs overtime using a GIF made in QGIS with the TimeManager plugin.This way my audience could digest the information over the length of the story with the supportive of a written narrative, rather than being overwhelmed by an imposing interactive element.

At the end of my article, I provided a combination of all my variables in an interactive that allowed users to control the time scale and different variables at the same time. The goal was to set the audience loose to explore the patterns only after they had understood its basic components. This design philosophy focused on communicating a data narrative, rather than flashy design elements — albeit fun to make.

The trade of data visualization and analysis is constantly evolving. It’s easy to get caught up in design, chart choice, and interactive elements. Although these components can dramatically increase comprehension and readership, they can also distract from the point you are trying to convey. When starting out, stick to the basics. First, master your data analysis skills to identify interesting relationships, then unleash your artistic talents to intelligently highlight these relationships, persuade your constituatiants, and make change.

Gabriel Watson
Data Analyst

Gabe leads the Common Knowledge program at The Commons and develops narrative and analysis supporting environmental and social causes. Specializing in R, Gabe tackles a variety of projects with data analysis to help our stakeholders enforce state water quality permits, advocate for environmental issues, and visualize water quality monitoring results. Hailing from Baltimore Maryland, Gabe spent his undergrad studying economics and urban environmental policy at Occidental College in northeast Los Angeles. After graduation he worked at USC’s Program for Environmental and Regional Equity performing data analysis and management to support social justice efforts in California. He has a particular interest in spatial data analysis and visualizations. In addition to leading Common Knowledge, Gabe builds R Shiny applications for the Water Reporter platform and provides user support for the Water Reporter API. Outside of work Gabe is an avid cyclist, fly fisherman, backpacker, sailer, and lover of the outdoors.