Interview with André Völker: Why does communication need a data lake?

Interview with André Völker: Why does communication need a data lake?

André Völker is a data analyst in the Siemens communications team “Data Management & Engineering”. As Product Owner, he is responsible for the conception, further development and operation of the central data hub for digital communication data.

Thomas Mickeleit: Siemens is considered a champion when it comes to data-driven PR. A data lake is a matter of course for you?

André Völker: We are on the right track. Over the past three years, we have worked with our stakeholders to agree on the data requirements and, based on this, connected all relevant platforms such as ad servers, social media networks or website tracking to our data lake. Currently, these are over 18 platforms that provide data for a wide variety of channels and activities such as owned / earned media, website, demand generation and for internal communications.

The challenges here are, on the one hand, that our colleagues in communication work with the relevant platforms and agencies that we take into account in the set-up, while observing certain guidelines, for example, when setting up and naming a campaign or tagging. This requires solutions that are easy to use and can be integrated into the processes, as well as training, support and documentation.

On the other hand, the challenge is to ensure technically that the data flows into our data lake completely and in a high quality every day. From the data lake, we make the data available formatted and structured in our data hub. Here, for example, we work with a monitoring system that monitors the daily processing of the data and enables rapid intervention in the event of problems.

Regarding consumption of the data, we made sure that the transformation, analysis or visualization of the data works with common platforms such as PowerBI, Tableau or KNIME.

The topic of data governance is increasingly on our agenda for the next few months. In coordination with our data analysts, the first step is to continuously simplify access to and work with the data, e.g., with the help of a data catalog. With our stakeholders, we want to continue to define roles and responsibilities along the data flow – from the collection of data to its consumption in analysis or visualization projects – and anchor them in the organization. When it comes to data lake, there is sometimes a danger of focusing very quickly on technology and architecture. It helped us enormously to involve the organization in the planning and implementation from the beginning and to approach the transformation towards data-driven communication together.

Thomas Mickeleit: Why do you need a data lake or data hub at all? What are the advantages?

André Völker: Anyone thinking about Big Data, advanced analytics or data-driven organization will very quickly come across the topic of data lake. The great advantage of a data lake is that raw data from a wide variety of sources and formats can be stored in a central location. This reduces the dependency on data suppliers such as media agencies and the time until the data is available.

The advantage of the data hub is then finally to provide the data from the data lake structured and formatted e.g. in the Parquet data format. The analysts can process the data quickly and easily for projects, e.g. visualizations, without having to deal with different formats.

The bottom line is that working with big data becomes scalable and efficient because the data can be used for a wide variety of use cases once the data source is integrated into the data lake.

Thomas Mickeleit: In any case, there is not only a desire to bring together different data sources, but Siemens has taken a major step toward this with the Data Hub. What triggered your desire to build the data hub?

André Völker: The starting point for the Data Hub in 2019 was the desire of campaign managers for a tool that provides evaluations of the different impact levels of activities – reach, engagement, interaction, and action. In addition, the requirement was to assist in the evaluation of activities using calculated benchmark values. Accordingly, we had to build an infrastructure as a basis that, on the one hand, structures large amounts of data and makes it available quickly for other applications. Second was the requirement to be able to run advanced analytics methods to calculate benchmark values. That was the birth of the Comms Data Hub, so to speak.

Since then, we have continuously expanded the data hub and today use the flexibility we have gained as a result to address a wide variety of issues. For example, there was a requirement to provide reporting of activities by focus topic. The challenge for us at the time was that the activities were not appropriately tagged with focus topics. Using the data from the Data Hub, we have found a way to replace manual tagging with campaign managers in an automated way.

Thomas Mickeleit: So the data hub allows you to measure not only channels, but specific topics. How does this work practically?

André Völker: To cluster activities by focus topics, we use different information that we have available in the Data Hub. On the one hand, this is campaign metadata such as the naming of activities or stored additional information from the campaign managers. On the other hand, we use information that is available in the context of the communication content such as. e.g. landing page URL, post text or hashtags. Using defined keywords, we search the mentioned information for each activity and assign them to the corresponding focus topics. Comparable to social listening in terms of how it works.

Thomas Mickeleit: I want to come back to Load. Many find this – rightly – the most difficult part. Visualize the data in a way that provides insight and triggers action. How do you solve this problem?

André Völker: As mentioned at the beginning, it is important to work closely with stakeholders. From their questions arise not only the requirements for the data, but also for the format of the presentation. A reporting for management is fundamentally different from an operational dashboard and it is important to know what decisions are made based on the data in order to choose the right KPIs and visualizations.

Over time, we have tried different methods of getting data as close as possible to our stakeholders. A breakthrough was Power BI’s ability to embed fully functional dashboards in Power Point. This was a big step towards automation and simplification of the analysis process. We no longer have to switch between Dashboard and Power Point, but can record observations, insights and recommended actions directly in Power Point and make them consumable.

Thomas Mickeleit: Data hubs or lakes are becoming increasingly important. If all business functions build their own data lakes, we won’t be any further ahead. How can this be prevented?

André Völker: In a dream world, of course, there would only be one central “data pot” containing the data from a wide variety of corporate functions. In the real world, this is currently difficult to realize. Especially in a large organization like Siemens. Apart from that, due to the requirements of how to work with the data and the existing system landscape, it is sometimes more expedient for business functions to find and implement their own solution for centralizing the relevant data. In addition, there is also the expertise depending on the corporate function, which flows into the processing of the data and ensures the quality of the data accordingly. In my opinion, it is more important for corporate functions to start the transformation to a data-driven organization, even if it is first within their own division. However, it is then elementary that data analysts from other company divisions can also access the data if required. So we share the data we collect in communications with data analysts Siemens-wide. The topic of data governance will become immensely important here in the future to ensure that the entire company can work with the data from the individual areas and benefit accordingly. In this context, the protection of personal data and strategically important corporate data is also of great importance. Technologically, there are already platforms that support the building of individual data hubs and the sharing of data based on data governance considerations.

Leave a Reply