Update: Python 3.7.7 is no longer available in conda. I’ve confirmed that this works with Python 3.7.12 (the earliest version of Python 3.7.x installable from conda). The code has been updated from 3.7.7 to just 3.7.
Important note: if you want to install geopandas, you will need to install it from the default channel. Conda, at least for me, had trouble resolving dependencies when I tried to install it from conda-forge. A modified scenario is provided at the end if you want geopandas installed. A link to my github is also included if you just want the yml file.
In my current position, my company uses Power BI for their business intelligence needs. That works fine for me, as I have no problem diversifying my skill set. However, in reviewing Microsoft’s documentation, I couldn’t find instructions for using conda environments. I thought oh well, maybe some blogger has more information elsewhere. Nope, couldn’t find it. So several hours later, I finally hammered down how to do it and thought it may be helpful for someone else too.
While I do think its worthwhile for you to read Microsoft’s documentation, the major requirement that you need to know now is that the current supported runtime is Python 3.7.7 (I know. I know. I thought ESRI was bad. Microsoft has exponentially more resources than ESRI and they still haven’t updated the Python version. At least ArcPy for ArcGIS Pro 3.x supports Python 3.9). I definitely recommend creating an environment just for Power BI. In this case I’m going to call the new environment powerbi. After creating the environment, I’m going to activate it to make sure I’m working within that environment. Finally, I’m going to install pandas. When prompted, type yes.
conda create env -n powerbi python=3.7 conda activate powerbi conda install -c conda-forge pandas
Here’s where things get interesting (and where I spent a lot of time trying to figure things out). For whatever reason, Power BI doesn’t like the conda installation of pandas, numpy, and matplotlib. So you’ll need to remove those libraries, but not their dependencies. Make sure you use the –force flag so that none of the dependencies are removed.
conda remove --force pandas conda remove --force numpy conda remove --force matplotlib
Now, you need to reinstall those libraries. However, you need to use pip instead of conda. Make sure to use the –no-deps flag as well.
pip install --no-deps --force-reinstall pandas pip install --no-deps --force-reinstall numpy pip install --no-deps --force-reinstall matplotlib
I have noticed that matplotlib can be a little finicky. In the event that it doesn’t want to play nice, i.e., you try to connect to Python in Power BI and it throws an error, the following line should fix it.
pip install --upgrade --force-reinstall matplotlib
Its likely the issue has to do with some dependency that is installed with matplotlib instead of matplotlib itself. Regardless, this seems to fix the issue.
Alternative scenario where you want geopandas:
conda create env -n powerbi python=3.7 conda install geopandas conda install matplotlib conda remove --force pandas conda remove --force numpy conda remove --force matplotlib pip install --no-deps --force-reinstall pandas pip install --no-deps --force-reinstall numpy pip install --no-deps --force-reinstall matplotlib #and just in case pip install --upgrade --force-reinstall matplotlib
To confirm that everything installed correctly, you’ll need to make sure that the Python interpreter in Power BI is pointed to the correct environment. My conda environment powerbi is located in the following directory:
To set your directory, go to File > Options and settings > Options. On the left hand side, under GLOBAL, click on Python scripting. If conda is your only installation of python, you should see it under the Detected Python home directories dropdown. That points only to the base environment in conda. We want the environment we specifically created for Power BI. To do that, select Other under the Detected Python home directories drop down menu. A new option should appear called Set a Python home directory. Point that towards the directory structure up above. Click OK.
Now, click Get data > More. In the search bar, type python. The option for Python script should populate to the right. Highlight it and click connect. A barebones console should pop up now. If we use the base code from the official documentation:
import pandas as pd data = [['Alex',10],['Bob',12],['Clarke',13]] df = pd.DataFrame(data,columns=['Name','Age'],dtype=float) print (df)
Click OK. If everything was setup correctly, you should see this screen.
Official Microsoft Documentation
The featured image is a cropped photo by Riku Lu on Unsplash