How to Use OSM Channel Data for Effective Communications

Posted by courtiney on 7/11/2023

Last month, at SotMUS in Richmond, Virginia, I, along with Marjan Van de Kauter and Keara Dennehy, presented on “How to Use OSM Channel Data for Effective Communications”

Background:

The genesis of the project comes from Marjan Van de Kauter’s and my work piloting an OSM community engagement program for TomTom. To make sure we were communicating about TT’s organised editing correctly, we began tracking and organizing communications channels. As the list grew, we realized we needed a better tool, so we worked with a TomTom developer to build a webscraper that could show us in which channels the community was active.

Later, we brought Keara on board as a business analyst who could build a more robust tool to manage all of the data. By this time, we had realized that this information was something that the community could use at the global, regional and local level.

Then, when I left TomTom, but kept volunteering for the CWG and the OSM/F board on fundraising and communications, we saw additional applications for the data. So, we decided to create a proof of concept for a communication channel data store and present our first efforts and findings at the 2023 SOTM US in Richmond.

The Context:

As background, Marjan and I shared some of the results from the Communications Survey we conducted in May. I wrote about it here. Some of the findings were skewed, but we identified some interesting trends, including:

  • Some respondents reported that they felt the forums have a hostile tone (35%)
  • Many respondents said they were able to keep up with the conversations, both locally (60%) and globally (49%). Nearly 70% said that they got at least one useful response if they posted a question
  • Respondents were more likely to read than post: 379 said they read daily or weekly and 152 said they posted daily or weekly
  • Older respondents were more likely to use the Listservs or Community Forum, whereas younger respondents were more likely to use Discord or Reddit

Although the channels are seen as sometimes hostile and often noisy, and adoption of the various platforms varies widely, people are able to get the information they need. It speaks to the shared purpose of the community.

The Channel Data Store:

The methodology for creating the channel data store was roughly as follows: Keara and the other members of the TomTom analysis team used the forum API to scrape the community forums and a webscraper for the mailing lists. The team also used the Azure language detection tool to add language information to the data returned from the data scraping process. User information was anonymized, and message content removed, before the data was stored in the team’s data lake. Visuals were created in PowerBI , a closed-source tool used by the data team at TomTom. The proof of concept for the data store was based on data from January 2022 to May 2023 and contained the following:

Channels

  • 60 community forums
  • 217 mailing lists
  • 86,177 messages

Posters

  • 3,039 in community forums
  • 1,698 in mailing lists
  • 76 languages (automatically detected)

Of those 86,177 messages, 56,356 were from European sources. These results were not surprising, because editing volume is higher in Europe than the rest of the world, and the European communities tend to favor the mailing lists and community forums. We’d expect to see more volume from Africa, Asia, Oceania, and Latin America in Telegram and other channels. From this data, we extracted a few interesting trends:

  • More than half of the messages posted in the forums and listservs are in languages other than English
  • English is more often used for global topics, such as “Tagging” and “Foundation”
  • Individual channels trend toward a single language, not multiple languages
  • Adoption of the Community Forums is mixed
  • Channel activity is driven by a few frequent posters.

Conclusion:

Our Proof of Concept raises a lot of interesting questions that we would like to pursue. Some of them include:

  • Are these frequent posters carrying a burden of disseminating the knowledge across OSM?
  • What is the best way to post about a topic that needs to be seen by the entire global community?
  • What are the effects of the increased use of new channels such as Telegram and Matrix?
  • How does the quality and availability of language localization affect access to posting and knowledge?
  • How does the quality and availability of language localization limit participation and knowledge sharing from some regions more than others?
  • How can we reduce channel noise for better all-community decision-making?
  • What could we learn if we could measure impressions, including liking and saving activity (which we can’t do in the listservs)?
  • How can we use this data to support fundraising and OSM messaging?
  • How can we use this data to support team work and inclusivity in OSM collaborations?

Next Steps:

We have prioritized two next steps for this project:

  • We are looking into developing an open-source version of the communication channel data store to share with the community, so any member can use it to analyze communications in OSM and make data-backed communications choices. We are also interested in adding data from other community channel types. If you’d like to get involved, please reach out to Marjan.

  • We are also looking for help analyzing user trends that can support best practices for communicating cross-culturally on distributed teams, including creating a data-backed OSM communications guide. If you are interested in getting involved with this work, please reach out to Courtney.

We’re also happy to hear any other questions or suggestions you may have about this project and potential applications of the data.

–Courtney

Marjan Van de Kauter

Keara Dennehy