I´m delivering a Microsoft Fabric project for a customer and we had some great discussions about what strategy we should follow for workspaces. This blog post is those discussions boiled down to couple of pages 😊
When deciding on a workspace strategy for Microsoft Fabric there are several, often conflicting, things to consider.
One is ease of maintenance. The fewer workspace the easier it is to maintain and manage.
Other considerations are segregation of duties, isolation of workloads and security isolation. These call for many workspaces.
Where on the scale an organization ends, is dependent on what is important to the organization and what kind of balance they want to reach.
Things that impact number of workspaces:
- Number of environments (dev, test, pre-prod, prod etc.)
- Number of stages (extract, staging, dw, mart etc,)
- Isolation of workload resources from each other (Data Factory doesn´t use resources for Spark or wise versa)
- Security consideration (developers are not allowed to see production, those doing ingestion should not be able to modify Spark code etc.)
- Ways of working with DevOps (one branch per workspace limitation in Fabric)
What are the outer number of workspaces?
It´s possible to have everything in one workspace but realistically you will always end with at least one workspace per environment (dev, test, prod) or at a minimum one for prod and one for rest. So, the minimum number of workspaces is 2-3.
At the other end of the scale, each workload will have one workspace per environment and each feature will have one workspace. If we imagine that we use Data Factory to ingest data into extract lakehouse, then we use Spark to clean and transform the data into staging lakehouse, then we use Spark to load the data into DW lakehouse, then we build a Power BI semantic model and finally reports on top of that. We then have 3 workspaces for each workload (assuming dev, test and prod). 3 for Data Factory, 3 for the extract lakehouse, 3 for the staging lakehouses, 3 for the dw lakehouses, 3 for the semantic models and 3 for reports. This gives us 15 workspaces. Add to that one workspace per developer for each of the workspaces if you decide to branch the workspace during development. These branch workspaces are temporary while the development happens and are most likely only accessed by the individual developer.
Other things to consider
How your team is compromised will be a deciding factor in what strategy you decide on. If you have a small team of developers which are developing the whole pipeline from ingestion to transformation to semantic models you can use fewer workspace than if you have dedicated developers for each workload. It´s also about trust. If you trust your developers not to mess with each other’s code, you can have fewer workspaces. If you feel you need to isolate workloads from each other, you will need more workspaces.
Therefore, there is no one rule for how many workspaces you should have. In my opinion you should be pragmatic about it and try to weigh the need for workload isolation and strict CI/CD protocols against ease of maintenance and development.
Impact of CI/CD on workspace stragegy
If you decide to use CI/CD for your Fabric development, you need to decide how your developers are going to work. You can only have one branch per workspace. This means that if you have more than one feature you want to work on, you need to decide if you work on all of them in one branch (workspace) or if you want to have one branch (workspace) per feature.
My recommendation is to have one branch (workspace) per feature and then when the feature is complete you merge that into the main branch which is connected to your development workspace. It´s important that you clean up the feature workspaces so you don´t end up with hundreds of dormant workspaces with the code in different stages.
At the moment the only way to deploy is via Fabric Deployment Pipelines. Therefore, this is the recommended way to deploy. If and when there are APIs for deployment of Fabric items, you can consider building your own deployment pipeline.
Where did we end with this particular customer?
We decided to go with one workspace per stage per environment. 3 environments (dev, test and prod) and 3 stages (extract, staging, dw) in the Lakehouse. Semantic models will have one workspace per environment. Reports were kept out of the scope as they are not developed frequently by the central team. We therefore will end with 12 workspaces plus workspace per feature while it´s being developed.