9 hour DataFlow to load 5GB to and Entity?
I am attempting to load 3,868,594 rows (5GB) to a Power App Entity from a standard D12 v2 Azure VM (4 cores and 32 GB Ram) SQL Instance. The gateway is currently on my local workstation. (i9 Core X-series) 64 GB Ram. (Production will require dedicated VM). I have a Per user Power Apps license and and Common Data Service Database Capacity. My goal is to increase the entity capacity ten fold to 50GB. The current 5 GB configuration described above has been loading for 9 hours.
Where can I read about configurations needed for load expectations? If you want to load n GB and x speed, you need such and such. All the computation and transformation is on the SQL VM Instance. Current strategy is to increase power there and then serve the data to the Power Platform. An SQL Azure was quite slow until I dialed it up to a machine costing $600 a month. And this was just to read from standard D12 v2 Azure VM and then serve as a live query in Power BI.
What kind of cost and strategy should I expect to access 50GB of data in Power Platform?
Hi @jarthda
The image in the reference below depicts well at a high level the key Power Platform components.
https://docs.microsoft.com/en-us/powerapps/maker/common-data-service/data-platform-intro
The 4 key offerings (BI, Automate, Apps, Virtual Agents), with key underpinning services such as CDS, Data Connectors, and AI Builder.
While not depicted above, these services all sit on Azure. While not required to use all of these discrete services, they are natively integrated and can be combined to provide business needs across several key areas.
So it's tough to answer your question of which is cheaper or more performant as they serve different purposes.
If we go back to your original concern about data migration, Dataflows are well suited for citizen devs to do low code ETL into CDS, but is still in its infancy and not ready yet for high throughout / enterprise data migrations.
If you're looking at migrating complex and/or large data volumes and are more technical, then a data integration tool like SSIS + Kingswaysoft or Scribe is designed precisely for this sort of use case and can get large data sets into CDS without much trouble.
Note that I am not trying to sell you on SSIS/Kingswaysoft, I just list it as one possible solution and I have experience with it on a few data migrations in the past and with current system integrations and it has worked well.
---
Please click Accept as Solution if my post answered your question. This will help others find solutions to similar questions. If you like my post and/or find it helpful, please consider giving it a Thumbs Up.
Not quite, Power Apps (often interchanged with Power Platform) are business apps built with canvas, Teams or model-driven apps. Power Platform encompasses Power Apps, Power Automate, CDS, Power Virtual Agent, Power BI, etc.
There is some overlap between Logic Apps and Power Automate and they use the same base technology. Although most of their capabilities are similar, they both offer some different features and connectors that are unique.
@EricRegnier. Hey I really appreciate the nudge on this. Most of my client work is in SSIS on a managed Instance Azure VM; impedance mismatch with all the exciting new MS Power Platform.
Do you happen to know which of the two are faster or less $? Is power apps the azure branding of power platform power automate? I noted some overlap there.
Really appreciate both of your time with this. @ChrisPiasecki
Hi @jarthda,
To supplement on @ChrisPiasecki, the issue is not CDS or actual CDS performance and load as mentioned, it's within Dataflows. Although Dataflows is a great low code/no code tool of the Power Platform, it is fairly new and limited. Dataflows are better suited for small datasets. Hopefully Microsoft will enhance it features, performance and configuration in upcoming releases. Besides SSIS (with optionally Kingswaysoft), you can take a look at Logic Apps and Power Automate to load data, you will have more control and a better performance outcome.
Cheers
Hi @jarthda,
You can batch requests to get optimal throughout. Unfortunately dataflows doesn't offer this comfigurability.
You may want to explore Kingswaysoft and it's connector for Dynamics 365 / CDS. It is built for large data migrations. You can configure it to batch requests and in parallel to get the most optimal throughout.
---
Please click Accept as Solution if my post answered your question. This will help others find solutions to similar questions. If you like my post and/or find it helpful, please consider giving it a Thumbs Up.
Ok @ChrisPiasecki good to know. So my 'proof of concept' for this particular client (5GB) dateset to scale up to 50GB data pipe is simply too much for CDS. I'll have 88 million rows. According to the limit of 6000 rows per 300 seconds, this takes 50 days to load. CDS is a toy in such a context.
I think the answer to my question is. 'You can't use CDS for such a data set'
How many years, by the way, does it take to load 4 TB into CDS? lol
John
Hi @jarthda,
CDS is built to scale and currently allows 4TB database sizes. The issue is not a scale issue but a throttling issue.
You can ignore the other information in that link as it's not important for dataflows but more for those working with the APIs programmatically. All client apps go through the same web API, whether it's PowerApps, Flow, BI, Dataflows, or a custom app. So yes behind the scenes the dataflows are issuing web API calls to create entity records and are subject to same limits.
You can vote for the idea to remove/change these limits for dataflows at the link below.
---
Please click Accept as Solution if my post answered your question. This will help others find solutions to similar questions. If you like my post and/or find it helpful, please consider giving it a Thumbs Up.
Hi @jarthda,
The technical details around those strategies apply to those programmatically interacting with the APIs, for Dataflows the only relevant part from the article is those limits. Whether it's PowerApps, Flow, Dataflows, or a custom application, they all go through the same API, so yes when you're loading records into CDS via a dataflow, it's making API calls to create the record.
CDS is built to scale and you can have up to 4TB database per environment, so it's not a scale issue, but an unfortunate limit of Dataflows that have the same throttling as normal operations. Many would agree that there should be different limits for data Integration scenarios.
I would suggest voting for this idea to remove/adjust the throttling for Dataflows. The more that vote the more likely it gets considered by the product team and implemented.
In meantime, you may have to look at an incremental refresh and limiting records imported at once.
Other options are using 3rd party products such as Kingswaysoft for SSIS / Azure Data Factory which is built with these throttling limits in mind for CDS and overall better suited for very large data migrations.
Hope this helps.
---
Please click Accept as Solution if my post answered your question. This will help others find solutions to similar questions. If you like my post and/or find it helpful, please consider giving it a Thumbs Up.
Thanks @ChrisPiasecki I appreciate the response and link and read how to maximize throughput for ETL applications. Have you loaded a 50GB Entity? I got lost as soon as it mentioned Web API calls and to "Gradually increase the rate you send requests until you begin to hit limits and then depend on the service protection API Limit Retry-After value to tell you when to send more. This value will keep your total throughput at the highest possible level." I'm just pointing and clicking, trying to load an entity. I have no idea how to send limited requests. Also, is pointing and clicking to load an Entity, issuing WEB API calls? My pointing and clicking failed after 9 hours. Is CDS just not made for datasets larger than a few GB?
I've heard CDS branding, "It's not 'Why use CDS? the question is why NOT use CDS?' Well I can't use it because I can't load 50 GBs of data.
Hi @jarthda,
There are throttling limits in place for Common Data Service. There is a max of 6000 API requests in a 5 minute sliding window. Any data integration including Dataflows will be subject to these limits.
More information is below:
https://docs.microsoft.com/en-us/powerapps/developer/common-data-service/api-limits
---
Please click Accept as Solution if my post answered your question. This will help others find solutions to similar questions. If you like my post and/or find it helpful, please consider giving it a Thumbs Up.
WarrenBelz
146,660
Most Valuable Professional
RandyHayes
76,287
Super User 2024 Season 1
Pstork1
65,999
Most Valuable Professional