Practical use cases for Dataflows
The potential of dataflows is the best-kept secret in the Power Platform. Dataflows were designed to extract, transform, and load data into Dataverse. But what exactly does that mean? How can this abstract concept be applied to actual business scenarios? This article explores different use cases for dataflows, through the lens of the distribution and wholesale industry.
Distributors are an essential part of many B2B supply chains. Their main purpose is to bridge the gap between manufacturers and end-users, by leveraging economies of scale in logistics and procurement. The essence of their role, from the customer’s perspective, boils down to providing logistical services, omnichannel customer experiences, and data processing services.
Most organizations are already well on their way to automating key segments of their logistical and warehousing processes. Modern Material Handling’s “2020 Automation Solutions Study” shows that nearly three-quarters of survey respondents have already implemented a warehouse management system (WMS), while approximately one-half of respondents are already using automated conveyors and cubing systems. The study further reports that a majority of respondents plan to upgrade or implement automatic guided vehicles (AGV) and robotics (pelletizing and picking) within the next 24 months [1].
Unmanned aerial vehicle deliveries are on the cusp of becoming a mainstream reality. Standing at USD$642M in 2019, Fortune Business Insights predicts that the global drone delivery market will reach USD$7,388M by 2027. [2]
Distributors have also rushed to adapt their “click and mortar” customer experiences to a new post-COVID reality. They have begun to realize that focusing solely on maximizing online sales is a deficient digital transformation strategy. An increasing number of distributors are now embracing a customer-centric omnichannel approach, that provides a seamless customer experience through multiple channels.
In the B2B distribution industry, these channels include traditional “brick-and-mortar” stores and phone service, desktop and mobile webshop experiences, ERP-integrated punch-out catalogs, and EDI transactions. Besides, customers are increasingly adopting source-to-pay platforms, such as SAP Ariba, to manage their procurement. These platforms are effectively a sales channel in and of themselves. The customer-centric omnichannel strategy is to B2B distribution what Burger King’s strategy has been to fast food: “Have it your way”.
The success of distributors in automating their logistics and sales channels has largely been the result of centralized IT planning, often supported by external consultants. IT departments acted upon the priorities of upper management, deploying standardized enterprise-scale solutions. This one-size-fits-all standardized approach has allowed organizations to quickly adopt new technologies while ensuring robust governance and security.
While this approach has been effective for retailers dealing with relatively flexible consumers, it has begun to show its limits in the B2B space. Whereas retailers can propose different sales channels to consumers, B2B customers often impose a channel upon their supplier as a sine qua non condition of doing business.
Large multi-national corporations have the leverage to dictate the terms of their business relationships, and they certainly use it. For example, invoicing via SAP Ariba may not be optional, it may be the only way to get paid. Providing a modern and intuitive punchout catalog is no longer enough: customers now expect the interface to be customized according to their case-specific requirements.
The unsettling new reality is that a distributor could potentially have as many sales channels as it has customers. In the B2B omnichannel landscape, being customer-centric is no longer enough. Offering customer-specific solutions is the new challenge. The key to addressing this challenge is to standardize the underlying data while providing custom-tailored customer experiences. The trend towards customer-specific sales and service channels will only accelerate in years to come. Indeed, many experts predict that customized channels will be the new standard in the B2B distribution industry. [3][4]
Managing customer specificities has always been the norm, and is still the main challenge, in the data processing space. A survey commissioned by Red Hat and released in June 2020 reveals that “aggregating multiple sources of data” is one of the top three challenges for IT departments overall. Over half of survey respondents listed this issue as a major concern. [5]
When asked specifically about their main data integration challenges, IT pros most often cited the following themes:
- Security
- Conflicting data and schema
- Making sure IT and the business are on the same page
- Connecting to legacy systems and being able to transform legacy data formats
- A way to see data from different places in the same way
- Combining data from different sources so that it’s meaningful and reliable
Over the course of the last century, there have been many attempts to address these problems. Electronic Data Interchange protocols, or EDI, has certainly been one of the most successful and widespread technologies used to this effect. EDI is a standardized system used to exchange business documents electronically between companies.
EDI was initially inspired by developments in military logistics following the Second World War. According to Wikipedia, “The complexity of the 1948 Berlin airlift required the development of concepts and methods to exchange, sometimes over a 300 baud teletype modem, vast quantities of data and information about transported goods.” [6]
Having withstood the test of time, EDI remains one of the most prevalent data integration systems today, particularly in the medical field. However, EDI is far from being an all-encompassing solution for distributors and wholesalers.
Firstly, EDI integrations are relatively complex and require a very specific, and increasingly rare, skillset. Inconsistent data models and low-quality data are also problematic when working with EDI. For example. distributors often struggle with the conflicting units of measure used by their suppliers and customers. The adoption of EDI technology varies widely by industry segment and company size. In many cases, organizations may not have the resources available to deploy EDI with all their partners, if at all.
Most manufacturers, distributors, and B2B customers are only able to implement the most basic EDI transactions: purchase orders (EDI 850), invoices (EDI 810), and advanced ship notices (aka ASN, EDI 856). Although EDI can address many other use cases, only a limited portion of EDI’s capabilities are in mainstream use in many market segments.
Most importantly, just as it takes two to tango, it takes two ready, willing and able partners to deploy an EDI integration. EDI is mostly used between major business partners, where the importance of the business relationship justifies the investment in an EDI integration.
Therefore, in many industries, the bulk of business documents are still exchanged via e-mail, usually in Excel format, and processed manually by business users. This is especially true when transacting with smaller business partners or smaller organizations.
Many other technologies are used for data integration, allowing organizations to do much more than exchange purchase orders and invoices. However, in most cases, these integration projects are complex and expensive, and their use is again limited to large-scale, high-volume applications, prioritized by management.
Dataflows, a relatively new feature of the Microsoft Power Platform, are a game-changer for smaller-scale data integration projects. Leveraging no-code low-code technology, dataflows have democratized data integration, empowering business users and IT pros alike to efficiently automate business processes using multiple sources of internal and external data.
According to Microsoft, “Dataflows are a self-service, cloud-based, data preparation technology.” [7] They “can be used to easily ingest, cleanse, transform, integrate, enrich, and schematize data from a large and ever-growing array of transactional and observational sources, encompassing all data preparation logic.” [8]
However, even Microsoft has yet to fully recognize the true potential of dataflows. In addition to preparing data for analysis, cascading dataflows can be intertwined with manual user interactions via PowerApps and automated actions triggered through Power Automate. They can also be indirectly infused with artificial intelligence, through AI Builder models running on the resulting datasets in Dataverse. They can integrate data into legacy systems using Desktop Flows (formally known as UI Flows).
Users familiar with Power Query in Excel or Power BI already have all the necessary skills to build dataflows in Power Apps. Indeed, dataflows are designed using a slightly modified version of Power Query, running in the cloud. However, any business user with the skills to perform VLOOKUPs and simple calculations in Excel can easily learn Power Query, through short (and free) online training provided by Microsoft Learn and various other sources. [9]
For distributors, the sales and procurement departments share similar challenges in managing external data. There are many similarities in the types of documents they exchange with external parties. Many of these exchanges are use cases for dataflows.
These exchanges of documents vary widely in size, complexity, and frequency. They also often involve manual processes that are more or less labor-intensive. The best return on investment with dataflows can be achieved by automating the exchanges that are relatively large, frequent, and time-consuming, with a medium degree of complexity.
In the wholesale industry, the nature and types of documents exchanged vary from one market segment to another. These differences also appear within an organization, as each external business relationship is assorted with a unique set of requirements and responsibilities. Therein lies the challenge identified by the Red Hat survey respondents.
The following types of documents are commonly exchanged with external parties by sales and procurement:
The underlying data in each of these exchanges is most often extracted from one database, usually in an ERP or an MDM system, and transmitted to an external party. The data is then processed, either manually or automatically, and uploaded to the external party’s database or file system.
This data is often stored in siloed databases and file systems. Business users seldom have easy access to all the data they need to perform analysis on the data. This restricted access is a factor of IT implementing strict data security policies, departments operating in silos, and sometimes by haphazard data management by business users.
The result is that business users waste a lot of time searching for information, copy-and-pasting data into an ERP system, and performing countless VLOOKUPs on unsecured Excel shadow “databases”. Despite the IT department’s best intentions, creative business users will find workarounds to get the data they need to complete their tasks. These unmonitored exchange channels and ungoverned data sources can sometimes lead to data loss, whether unintentionally or with malicious intent.
While dataflows are a means to get and transform data from multiple sources, Dataverse is the cloud service where the data is stored, managed, and accessed. Dataverse leverages the Common Data Model, which provides “a shared data language for business and analytical applications to use”. [10]
Essentially, dataflows can be used to transform and standardize external data from multiple sources. Once uploaded into the Dataverse’s set of standardized data schemas, it can be leveraged by Power Apps, Power Automate, Virtual Agents, and Dynamics 365. Through Power Automate, users can also interact with the data using the tools they use every day, such as Outlook, Excel, Word, and Sharepoint.
Unless an organization is using Dynamics 365 as an ERP/CRM system, the purpose of Dataverse is not to eliminate all data silos. Even then, it will most often be necessary to have separate databases for other legacy or third-party systems. Rather, Dataverse is a space where all this disparate internal and external data is unified in a standardized and secure fashion.
Dataverse can help minimize, but not eliminate, the risks of data loss by reducing the need to maintain shadow Excel “databases”. It can also reduce the number of ad-hoc document exchanges between internal and external users. Manual errors can thus be avoided, such as accidentally sending your Excel customer list to Tom Woo, your dry cleaner, instead of Tom Wu, your co-worker.
The Dataverse is part of an “environment”, which is essentially a container for apps and data. An organization can have one or many such environments. A solutions architect can determine how many environments should be deployed, and how the Common Data Model should be applied and extended. Often, separate environments are assigned to different departments or operating units.
The concept is that a Dataverse environment should contain all the data that users need, and nothing else. For example, while an ERP system may contain all the transactional data for an organization, the Dataverse environment would contain only the data that is relevant to the users who are assigned to that environment. Within an environment, security can be set at the row and column (record and field) level to ensure users only have access to the data they are privy to.
The optimal role for dataflows in this context differs for each document type.
Master Data Management systems are best suited to manage master data, especially in large organizations. However, users need access to this data to perform their daily tasks. Dataflows can be used to extract relevant master data from an MDM or an ERP system and transform this data into the standardized Common Data Model format. Once in the Dataverse, this data can be accessed securely and leveraged by the Power Platform.
Whenever possible, EDI integrations are best suited to manage large volumes of orders, order acknowledgments, shipment notifications (or ASNs), and invoices. API-driven integrations are also gaining momentum. However, in both cases, these integrations require both parties to be ready, willing, and able to implement such an integration.
Dataflows can be used in lower-volume applications when either party is unable to implement a more complex EDI/API integration. Incoming documents can be transformed into a standardized Common Data Model format using dataflows, loaded into the Dataverse, and transformed once more into a format that can be ingested by the organization’s ERP system using a combination of dataflows and Power Automate. The last step is not even required for organizations using Dynamics 365 as an ERP/CRM system.
Whereas a specialized developer is required to implement an EDI/API integration, a business user can easily design dataflows using the intuitive Power Query tool. As such, integrations with smaller external parties that would never make it onto IT’s priority list can be completed without the involvement of a developer.
If Dynamics 365 is not used as an ERP/CRM, dataflows can be used to extract and transform transactional data from the ERP and to load it into Dataverse, similarly to master data.
The AI Builder Power Platform add-in can also be used in conjunction with the aforementioned tools to process unstructured documents, such as PDF files. [11]
The previous use cases required relatively little user interpretation or input. The purpose of dataflows in those use cases was essentially to transform data into a standardized Common Data Model schema and load it into Dataverse. Dataflows were designed for this purpose, as an ETL (extract-transform-load) tool for Dataverse. However, as dataflows are built using Power Query, they can do much more than was originally intended by Microsoft.
Many business processes require users to interpret external data, compare it with multiple internal and external data sources and provide input based on this analysis. Power Query within Excel is a very powerful tool to automate portions of these business processes, but it has many drawbacks:
- The user must manually run the query in Excel. A degree of automation is possible using VBA macros, but VBA coding is beyond the skill set of the average business user.
- Queries are executed using the local computer’s processing power. Complex queries with large datasets can run for dozens of minutes, sometimes even hours. During this period, the user’s computer can be slow, and Excel may become unresponsive.
- Most often, both the Power Query author and its user have access to the raw data.
- The query script itself and the resulting data are stored in a portable Excel file. Unless the organization has implemented effective sensitivity label policies within its governance and security strategy, the Excel file can easily be leaked outside the organization, lost, or forgotten.
Dataflows within the Power Platform bring Power Query to the cloud. Dataflows can be refreshed manually, or they can run on a schedule. Moreover, a powerful new feature, presently in preview, brings things to another level. A new dataflow connector for Power Automate was announced last December. Now, dataflows can be triggered through a Power Automate action and can act as a trigger within Power Automate upon dataflow refresh completion. [12]
The dataflows run in the cloud and do not affect the performance of a user’s local computer. Users can trigger a dataflow without having direct access either to the Power Query code, the source data, or even the output data. Indeed, the output of a dataflow is not directly accessible to the user but is rather loaded into Dataverse.
Once in Dataverse, users can interact with the data with Power Apps, but will only have access to the rows and columns that are relevant to that user’s role and security privileges. As the relevant master data and transactional data is readily available for users in Dataverse, the need for separate Excel files is greatly reduced. Indeed, this data provides the foundational basis to automate portions of value-added use cases, such as the ones described below.
- Quotes: In the B2B space, requests for quotes can often include thousands of parts. These are typically sent by potential clients in Excel format. The quality of the data differs widely from one organization to another, but such requests often include poorly structured data, with key information scattered unevenly across several columns. Part numbers may be embedded in free-text descriptions. Worse, different manufacturers may share the same part number (ABC123 may be a fitting at WidgetsCo and the same part number may be a screwdriver at ToolsCo). There are often discrepancies in units of measure. Power Query, and thus dataflows, can be used to efficiently match customer part numbers with distributor part numbers. They can also be used to perform advanced calculations and flag issues for review, beyond the scope of calculated fields and business rules in Dataverse. Once the data is matched by the dataflow, model-driven apps offer the ideal workspace for quotations specialists to complete the proposal in a structured and efficient manner. Excel and Word templates can also be used within Power Apps to prepare the documents that will be sent to the potential customer. Once the work is complete, the RFQ data can be securely archived and reused for further steps should the bid be successful.
- Backorder/KPI: Managing backorders is increasingly important, as clients expect just-in-time delivery. Supply contracts are often governed by delivery KPIs (similar to SLAs in the service industry) and are assorted with penalties for non-compliance with “obligations of result”. “Time is of the essence”, contractually speaking (lawyers love/hate those clauses). Matching customer backorder lines with distributor backorder lines is not always as simple as it seems, as establishing a common key may require a manipulation of text and numerical values. Other factors must also be considered: defects, returns, discontinued parts, split shipments, drop shipments… Dataverse alone does not have the flexibility to address these issues, but dataflows can handle most of them. The calculated fields and business rules in Dataverse only consider calendar days, whereas backorder monitoring and KPIs are often based on business days. Again, dataflows can handle this. Dataflows can also be used to proactively identify risks and offer insights into possible solutions. They can be of assistance in determining the root causes of a late shipment.
- Accounts receivable: Most large customers respect their terms of payment, as per their contract. If an invoice remains unpaid, there is usually an underlying problem. Dataflows can be used to identify these problems. Perhaps an order is unpaid because of an unresolved non-conformance report. Perhaps there is a digit missing in the PO number. Perhaps the order was returned, and the credit was improperly applied. Perhaps the order was accidentally shipped to Timbuktu. By combining data from multiple sources and using advanced calculations not available in Dataverse calculated fields and business rules, dataflows can help identity why an order is unpaid and provide the information required to resolve the issue.
Dataflows do not eliminate the need for manual analysis in complex scenarios, but in most use cases they can be used to automate parts of the process. Then, Power Apps offer a standardized platform for business users to complete the analysis of the semi-processed data, with relevant master and transactional data at their fingertips in Dataverse.
Circling back to the Red Hat survey, a combination of Dataverse and dataflows can address the most common data integration challenges:
- Security
- Dataverse boasts rich field and record-level security
- The use of portable unsecured Excel files can be reduced
- Secure connections can be used to extract the data
- Conflicting data and schema
- Power Query can be used to transform and standardize the data in the Common Data Model
- Making sure IT and the business are on the same page
- Dataflows can be designed by business users, no need for IT developers
- In this scenario, the role of IT is to ensure proper governance and enforce security
- Connecting to legacy systems and being able to transform legacy data formats
- As long as data can be exported from the legacy system in CSV or Excel format, it can be transformed by PowerQuery (special care should be taken when exchanging documents via e-mail to ensure security, such as asking the sender to send the data to an administrator-managed mailbox.)
- A way to see data from different places in the same way
- Model-driven apps are particularly well suited for this challenge. Role-based forms can be designed to show users exactly the data that is relevant to them, in a way they can relate to.
- Combining data from different sources so that it’s meaningful and reliable
- This is exactly the challenge the Power Platform was built to address.
The Power Platform empowers business users to automate case-specific workloads, but only if they are afforded the time to do so. Automation does not replace workers, but it does transform their role. To leverage dataflows, and indeed the whole Power Platform, business users must be allowed to learn and experiment. New full-time roles for citizen developers must be created within business units. New governance and security policies must be established to manage these solutions.
Ten years ago, distribution centers seldom had any electro-mechanics on the payroll. Today, electro-mechanics have a key role in any automated distribution center. Millions were invested in conveyors, cubing machines, shuttle systems, automatic guided vehicles, drones, and pelletizers. Nobody would expect a warehouse worker, armed with a tape gun, to mend broken drones on his lunch break. Nobody would expect Knapp or Schaefer to supply warehouse automation systems for free.
Similarly, automating business processes with the Power Platform requires investment, both in people and technology. Simply providing tools to a team does not count as empowerment. The citizen-developer is to the Power Platform what the electro-mechanic is to the Premier Tech pelletizer. It is a function in and of itself.
Warehouse automation and e-commerce platforms changed the distribution landscape. These enterprise-scale projects are now paying dividends and leaving the competition that failed to adapt in the dust. The Power Platform and dataflows leverages data and low-code technology to bring automation to the next frontier: task-specific use cases that can be automated by anybody with the will and time to innovate.
Further reading:
The Case for Citizen Developer Micro-Products, From the Forest to the Trees
Citizen Developers: Out of the Shadows. Low-code strategies to minimize risk and maximize ROI.
About the author: Charles Séguin is passionate about automating the boring stuff so his team can focus on what matters most: the customer. Based just outside of Montreal, Canada, he works at Lumen, a division of Sonepar, the global market leader in B2B distribution of electrical products, solutions, and related services.
Disclaimer: The opinions expressed in this article are those of the author and are published on a personal basis.
Photography: Shot on location near Da Lat, Vietnam, and in Bali, Indonesia.
Infographics: Designed by the author with icons provided by contributors on www.flaticon.com
Endnotes
[1] https://www.mmh.com/article/theres_no_stopping_warehouse_automation
[2] https://www.fortunebusinessinsights.com/drone-package-delivery-market-104332
[3] https://www.websoptimization.com/blog/b2b-ecommerce-trends/
[4] https://b2bwoo.com/blog/b2b-ecommerce-trends/
[6] https://en.wikipedia.org/wiki/Electronic_data_interchange
[8] https://docs.microsoft.com/en-us/power-query/dataflows/create-use
[9] https://docs.microsoft.com/en-us/learn/modules/clean-data-power-bi/
[10] https://docs.microsoft.com/en-us/common-data-model/
[11] https://docs.microsoft.com/en-us/ai-builder/create-form-processing-model
[12] https://docs.microsoft.com/en-us/connectors/dataflows/
*This post is locked for comments