Episode 410: 🧠🧐 Local LLMs: Open-Source AI meets Business Central 🧐🧠 Artwork

Dynamics Corner

About Dynamics Corner Podcast "Unraveling the World of Microsoft Dynamics 365 and Beyond" Welcome to the Dynamics Corner Podcast, where we explore the fascinating world of Microsoft Dynamics 365 Business Central and related technologies. Co-hosted by industry veterans Kris Ruyeras and Brad Prendergast, this engaging podcast keeps you updated on the latest trends, innovations, and best practices in the Microsoft Dynamics 365 ecosystem. We dive deep into various topics in each episode, including Microsoft Dynamics 365 Business Central, Power Platform, Azure, and more. Our conversations aim to provide valuable insights, practical tips, and expert advice to help users of businesses of all sizes unlock their full potential through the power of technology. The podcast features in-depth discussions, interviews with thought leaders, real-world case studies, and helpful tips and tricks, providing a unique blend of perspectives and experiences. Join us on this exciting journey as we uncover the secrets to digital transformation, operational efficiency, and seamless system integration with Microsoft Dynamics 365 and beyond. Whether you're a business owner, IT professional, consultant, or just curious about the Microsoft Dynamics 365 world, the Dynamics Corner Podcast is the perfect platform to stay informed and inspired.

All Episodes

Dynamics Corner

Episode 410: 🧠🧐 Local LLMs: Open-Source AI meets Business Central 🧐🧠

March 11, 2025 • Stefano Demiliani • Season 4 • Episode 410

1️⃣ In this insightful episode, the conversation explores the growing world of open-source large language models (LLMs) and their transformative potential when deployed locally.
2️⃣ Guest Stefano Demiliani joins Kris Ruyeras and Brad Prendergast to break down the technical challenges and rewards of running models like DeepSeek on local hardware, from navigating hefty resource demands to leveraging techniques like quantization and distillation for efficiency.
3️⃣ The discussion dives into practical business applications—inventory management, autonomous AI agents, forecasting, and even image recognition—all powered by offline models to prioritize data security and cost control.
4️⃣ Listeners will discover how integrating these customizable, secure solutions with tools like Microsoft Dynamics 365 Business Central can streamline operations and unlock new efficiencies. From setup essentials to the collaborative future of AI agents, this episode offers a clear-eyed look at how local AI is reshaping business innovation with privacy, precision, and purpose.

Send us a text

Support the show

#MSDyn365BC #BusinessCentral #BC #DynamicsCorner

Follow Kris and Brad for more content:
https://matalino.io/bio
https://bprendergast.bio.link/

Speaker 1: 0:00

Welcome everyone to another exciting episode of Dynamics Corner. What is Local LLMs DeepSeek Vy4? I'm your co-host, chris.

Speaker 2: 0:11

And this is Brad. This episode was recorded on February 20th 2025. Chris, chris, chris. Local language models. Local large language models.

Speaker 1: 0:21

Is that what that?

Speaker 2: 0:21

means. Yes. This was another mind blowing conversation. In this conversation, we learned about large language models running large language models locally what are all of these models and how we can communicate with these models, with Business Central With us today, we had the opportunity to learn about many things. Ai with Stefano D'Amelio. Good morning Good afternoon.

Speaker 1: 1:01

Good morning for me. Good morning for me Good night.

Speaker 4: 1:04

Good morning for me, good night, good afternoon for you.

Speaker 1: 1:11

It feels like nighttime here, but it's early morning, it always feels like nighttime here.

Speaker 4: 1:17

I always forgot the time zone. Yes, you are early morning.

Speaker 2: 1:24

Well, you are six hours ahead of me, okay, and then nine hours ahead of chris okay, so perfect. So, yeah, it is, it's perfect. It's perfect for me because it's not nighttime, it's perfect for you because it's late. Yeah, exactly, it's perfect for chris because it's very early so it's perfect for everybody. It's normal that I have an uploading uh message on top yeah, yes, yes, yes it it collects the local audio and video so that we have some high quality files to put together to make you sound amazing. But you already sound amazing.

Speaker 4: 2:11

No, not too much you are amazing with your podcast.

Speaker 2: 2:17

Yeah, thank you we're only amazing because of individuals like you. And what's the greeting in Italy? It's not ciao, it's how do you say? You know, usually we'll say good morning.

Speaker 4: 2:31

Hello, we say ciao or no. We usually use ciao, it's the standard. Buongiorno.

Speaker 2: 2:41

Buongiorno.

Speaker 4: 2:42

Buongiorno is another way, ciao is more informal.

Speaker 2: 2:49

Ok, and then when you say bye, do you say arrivederci?

Speaker 4: 2:52

or ciao again, arrivederci exactly you speak. Italian perfectly, I'm ready to go to Italy.

Speaker 2: 3:02

I'm ready to go to Italy. Still haven't made it over to Europe.

Speaker 4: 3:06

It's a struggle, but hopefully this year I'll be able to make, I'll be able to make one of the appearances over there, one of the conferences out there it's always a challenge one of the next European conferences yes, yes, there's several coming up.

Speaker 2: 3:22

It's a matter of trying to find the one that works out best logistically. Yeah, I agree.

Speaker 4: 3:27

It's not always easy to balance every event that there's outside, so balance events, working family and so on is not easy.

Speaker 2: 3:40

No, it's not easy In Europe we spoke about before, I think casually, europe is like the United States in the sense that, oh, excuse me, the United States in itself is like Europe, where you have the United States as a large continent or a large country, excuse me and it has many states In Europe Now, probably one more If you join Canada.

Speaker 2: 4:05

Don't even get me started on that, I don't want Canada. If you join Canada, don't even get me started on that, I don't want Canada. They can keep Canada. Let's give Canada to somebody else. But we travel in the United States across states, like Europeans travel across countries.

Speaker 2: 4:24

So when there's European conferences it's a little bit easier for you to move around. I understand. Also, for you to come over to the United States it's a little difficult because you understand, in essence it's a day of travel somewhere and then you have to attend a conference or do something, then a day of travel back. So you don't usually do something like that without trying to get additional time to do yeah it's easier for you though because you're east coast.

Speaker 1: 4:49

If you're flying east coast to europe, it's it's a much shorter flight, like for me.

Speaker 4: 4:52

I have to cross the country, yeah, and then go the other way I remember when I was in the us, uh, some years ago, from los Los Angeles to moving to New York. It was about, if I remember, four or five hours of flight, something like that?

Speaker 2: 5:11

Yeah, some of the flights, like you said. Yeah, this is about five to six hours, depending on where on the East Coast that you go, so that is just itself going one side to the other. It's a little challenging, chris, it becomes. Which airport do you go to? Yeah, and Europe is fortunate that they have a great rail system because you can go from country to country easily. And.

Speaker 4: 5:32

I often forget that, so I see some of these events.

Speaker 2: 5:35

I was talking with someone they said they were recommending if I wanted to go to one of the events, we'll fly to this airport. You could probably get a direct flight. Then you can take a train easily for a few hours to get to the destination, which was much shorter, when I looked at it, than flying oh yeah, for sure, to an airport and having the connections yeah, they do have good.

Speaker 1: 5:57

You do have good transportation. Ours is like a greyhound bus, but that takes like forever to get around.

Speaker 2: 6:05

I wish I do wish we had a better transit system. Some of the cities have great transit systems. Boston has a subway and they have some rail, exactly. And then New York has it used to be a good system, but now, from my understanding, it's a disaster. You avoid it. There's ways that you can get around, uh, but if you want to go from boston to florida, for example, you can take a train, but the train will take you a day and it's so it's it's challenging it's challenging but thank you for taking the time to speak with us.

Speaker 2: 6:40

I've been looking forward to speaking with you about a topic that is interesting to most people these days, even more so, I think, in the development point of view. But before we jump into it, can you tell everyone a little bit about yourself?

Speaker 4: 6:57

A little bit about myself. My name is Stefano. I'm working mainly in the business central area and in the Azure area, so this is the topic that I cover. In my company. I am responsible for all the development team inside my group. My group is called Lodestar and we are quite a large group in Italy and I have the responsibility of managing the development part of business central area and the Azure area, so serverless applications and so on. Recently, as you can imagine, we have also started working on the AI staff, and so I'm currently also leading it at the moment small, but I hope that we will grow team that is involved on providing also AI solutions to the customers.

Speaker 4: 7:55

I have a long history from in business central area, previously NIV. I started in NIV I in version 2.1, navision when it was NaVision 2.1. Then it was acquired by Microsoft and so on. So I follow all the roadmap of this product and now we are here. We are in the cloud. So there was lots of evolution in the product, lots of steps and there really was.

Speaker 2: 8:33

I. One day we'll have to sit down with a few people that have been working with it as long as you have and just talk about the evolution of the product from where it was, back with the classic client, with the native database that they had, then when they added SQL, then when they added the roll-tailed client, you know, continue through the progression of the evolution of both the product and the language.

Speaker 2: 8:57

And I said it before, originally they had three versions, if you recall. They had the financials version, the distribution version and the manufacturing version. So depending on which customer type you were, you would get a specific version of.

Speaker 4: 9:11

Navision and that was Navision has had a lot of evolutions in the years. I remember we started with the classic client and the native database, so this was extremely fast, so very, very great on that, with a lot of limitations probably when going to big customers and unfortunately we started. My first Navision project in my life was with a very big customer because we decided to move to Navision and all the healthcare system that we have and we historically in my company we have an healthcare dedicated sector and we have a solution, previously handmade solution based on Oracle database. In years, two or three years before the introduction of the euros, we decided to move this solution, solution to Navision classic database, because it was only possible this solution had, if I remember four or five hundred users and it was a very big solution.

Speaker 4: 10:38

And then we moved to SQL Server. When we moved to SQL Server from classic, there was a lot of problem conversion of data and something like that but the solution is still live and the curious part of that is that we are in 2025, we'll be the central line and so on, but we have also today customers that are using the old Navision, converted to NAV 2009,. But we have still today live customers and also big customers that are still on the platform. We are trying to convince them. Wow.

Speaker 2: 11:14

Is that it? I know of a customer as well that's using Nav 2009, and I think they have close to 400 users and they haven't decided to make a move.

Speaker 4: 11:24

The curious part of that. What sometimes makes me crazy is that in my everyday job at office, maybe during the day, I need to switch from yes Code, iel language and so on to OpenClassicLine and IV2009 to fix something or to add something. So also today we need to switch from totally different.

Speaker 2: 11:46

Wow, it is interesting to see the difference and, as you had mentioned, you get used to working with AL and VS Code and all the tools that you have within VS Code, all the things that were added, and you go back to 2009,. You see what we really had to do to do code, even when they added the separation for functions. It was a big deal for me that they had the gray bar where you could separate between the functions, which was an even fine reference. It was good. Also, I didn't get a chance to speak with you in person. I know we've communicated with text and written, but congratulations on the book that you and Julio put out. It's a great book. I did pick it up.

Speaker 4: 12:23

I have it. Yeah, we have worked quite a lot on that.

Speaker 2: 12:28

So we hope that I can only imagine. I can only imagine.

Speaker 4: 12:31

We receive a lot of positive feedback from the community Very useful.

Speaker 2: 12:35

It's very useful. It is on my shelf. I have it right behind me.

Speaker 2: 12:38

Yes yes so it's uh, I have it as well no, so, uh, thank you for doing that and creating that and congratulations on putting together something so informative, uh for users. But now let's jump into this llm stuff. Yeah, because you have been doing some things that I don't know, if I can say I understand or don't understand, but anytime I see something that you post, you're always doing something new with local language, large language models, but you're also doing a lot locally, exactly. I see so you're installing and setting up AI or language models on your computer.

Speaker 1: 13:21

Yes, your local machine.

Speaker 4: 13:22

Wow, exactly, computer. Yes, your local machine. Wow, exactly uh what we have uh, but I think that everyone that is following uh technology information uh today on socials or on internet or something like that. You everywhere read about ai, uh, ai is a topic that is absolutely exploding and uh there are.

Speaker 2: 13:44

I don't think you can go five minutes without hearing it Exactly. I really don't, except when you're sleeping, I think, even maybe what you're talking about. If you're listening to the news, if you're having a conversation with someone at work, if you're reading something online, I think you can't go five minutes unless you, like you had mentioned, Chris unless you're sleeping or you just are sitting by yourself in the woods somewhere without hearing AI Exactly.

Speaker 4: 14:06

And I totally agree. And the history about my these stuffs that I'm doing today is I think that the majority of us knows that the big AI vendors like OpenAI, microsoft, google, something like that so these are now also Twitter or X sorry, not Twitter X Grok, as we recently released Grok3, that is extremely powerful. So the concept that we embraced some years ago is that we start providing AI solutions by using standard AI models. So Azure, openai was our first choice, and this was absolutely easy to do. Just go on Azure, set up a model, deploy. Deploy a model and then you can use your model in business central or in different applications you want to use. We.

Speaker 4: 15:12

We have some problems on that, so on on some scenarios, and the problem of that is that sometimes, when is it not easy to provide and convince customers that an AI solution is something that can be a winning choice for them? So you need to demonstrate something, and some customers also are not so prone to leave your data accessible to internet or maybe have some devices, particular devices. We have, for example, scenarios in manufactories where they cannot access internet or don't want to access internet for different reasons, or cannot access the browser, for example. This was another limitation no browser as the way to interact. And so for that reason, this was one of the reasons that turned me the light to start exploring something different. And the second reason for that was that there are a lot of scenarios at least in my experience lot of scenarios where AI can be useful, but for these scenarios is not absolutely needed the full power of a giant LLM. For example, why I need to pay for I don't know GPT-4A when I only need small staffs or I only need to do function calling or something like that. Sometimes AI for a big company can be costly for probably nothing. For a big company can be costly for probably nothing, and it's not absolutely not always choosing the best performance LLM is gives an advantage to the final customer.

Speaker 4: 17:23

So, with these reasons, I started exploring a new world, that is, the open source LLMs, because it's probably a world that is not so spread everywhere. But the AI world is also full of open source LLMs, and these open source LLMs are also provided by big vendors like Microsoft is providing open source LLMs, google is providing open source LLMs, meta Lama and more. So DeepSeek is also provided as an open LLM. These LLM these LLM are, in many scenarios absolutely powerful can be executed offline and sometimes can give the same result to the customers as using one of the full version that you have available in OpenAI or Azure, openai or X or something like that, giving absolutely the same results but without going to internet, without totally private, and so on. So that's why I started exploring this world.

Speaker 2: 18:45

My mind is full of questions. So you're working with open source LLMs to run AI locally, the language models locally, versus running them online. I have several questions. With that One we'll get to. How do you set all that up, but we'll talk about that after. How do you determine the differences between the models that you choose to use, which they and you had mentioned some of the big names that we hear of outside of the open source ones, with microsoft, with google, with meta and now XAI how do you know which model to use, or what's the difference between the models? Because I see, like the GPT-4.0, grok 3, grok 2, Cloud, sonnet 3.5. I see all these different language models and how do you know what the difference is between them? Or is?

Speaker 2: 19:44

it just all the same, and it's a different name based upon who creates it. Are they created equal?

Speaker 4: 19:50

No, if I can try to share a screen, if possible, so that we can.

Speaker 1: 19:56

Yes, that would be wonderful.

Speaker 4: 19:58

We can talk probably now.

Speaker 2: 20:03

Very cool Excellent.

Speaker 1: 20:06

I'm excited about this. I'm excited.

Speaker 2: 20:08

There's some cool stuff on your screen with graphs moving.

Speaker 1: 20:10

And you're a Mac user.

Speaker 4: 20:15

But now it's working. Sorry for the problem, but I don't know why no one will know, so we can see your screen.

Speaker 2: 20:26

You have a window open with some graphs and some things moving. Yes, what?

Speaker 4: 20:31

I will start first showing is this, this window. So Hugging Face. Hugging Face is one of the main, probably probably one of the main portals and platforms where open source LLMs are distributed from all the different vendors, and so every vendor that wants to distribute an AI model today in the open source world release on AgingFace and on AgingFace you can see, if you click on models, you can see that here there are tons of models deployed here. Some are models completely open source, models like and not very known models like, as you can see, a lot of names that are not so famous. But there are models that instead are extremely famous and they have also their counterpart that is not open source and is released as a paid service, like, for example, probably one of the most famous today is DeepSeq. Deepseq is a very powerful model. Deepseq, as the full DeepSeq model, is a big model with 671 billions of parameters, so it's a very extreme large model that, in order to be executed locally, requires more than 400 gigabytes of RAM. Wow.

Speaker 2: 22:32

So you need 400 gig of RAM to run this locally. Wow.

Speaker 4: 22:38

That was one of my questions.

Speaker 2: 22:40

The hardware requirements. Well, you have a large model that is run online, such as DeepSeek and the ones that we had mentioned. That was the first question I had is if you want to run these locally, what are the requirements that you have to run them locally, Because I don't know of many people that have a 400 gig of RAM computer?

Speaker 4: 23:10

people that have a 400 gig of ram computer. It's uh, it's something that uh you cannot execute uh in a local, uh local machine, but here for uh open source model, that's uh an important concept to understand. That is called quantization. So quantization is, in simple terms, is a technique that an LLM vendor can use to reduce the computational and memory cost requirements of a model. So in try to explain that in simple terms, is like starting from a full power LLM. So an LLM that is provided by the vendor cannot be executed online because it requires a data center in order to be executed.

Speaker 4: 24:03

These models pass through a process that reduces the precision of the model, so can reduce the floating point required representation of that models. So it's something like compressing that model and create from that model a smallest model with the same capacity but with less precision. That's the idea. So you start from a giant. You can detach smaller children of that giant with a bit of smaller precision. But smaller precision doesn't mean precision in terms of responses or in terms of capacity. It's something like reducing the neural network inside that model.

Speaker 4: 25:07

So if you can see here, for example, without going to mathematical concept, because quantization is honestly a mathematical concept, if you can see here this is the full DeepSeq model 671 billion of parameters. These models cannot be executed offline unless you have a cluster of machines, because it requires not less than 400 gigabytes of RAMs and GPUs in order to be executed online. So I cannot execute it offline and probably you cannot execute it offline in your machines and probably also many of them, Unless you got a data center there, Brad somewhere.

Speaker 4: 26:02

It's under my desk. This is why these models are provided as services from the cloud, so you can execute, activate a subscription to DeepSeq or deploy DeepSeq today. Also on Azure. It's available on Azure AI Foundry. You can deploy the full DeepSeq and you can use as services. But here you can use as services. But here you can see that also there are available the distilled models and that distilled models are a reduced version of DeepSeq in this case.

Speaker 4: 26:36

So models that are passed through a process called quantizations and through a second process in this case, from the case of DeepSeq called quantizations, and through a second process in this case from the case of DeepSeq called distillations. And distillation, as you can see here is another technique that is using open source AI. So the distillation is a machine learning technique that involves transferring knowledge from a large model to a smaller one in order to create a model that has the same features and knowledge of the big but of the medium. In this case, dpsic transfer it to a smaller model. So in this case you can see that here DPSIC is providing several distillation of DPSIC, so it's coming from these models. These are the base model that is used to. Deepseq has trained this model in order to have a new model called with these names, ah.

Speaker 1: 27:56

It's a voluntary model.

Speaker 2: 27:59

So with this process, just to take it back, so in the cloud, they have a model that has billions of parameters, as you had mentioned. They go through a distillation process and they reduce it so that it can run locally on a reasonable machine. Exactly, you said that the precision is off, is there a difference in the results? What's the difference with them? Reducing it versus running it in the cloud? Is it speed in response? Is it accuracy? I don't even want to use the word accuracy.

Speaker 4: 28:35

The main difference that you can experience on some scenarios is probably accuracy, because the full model has obviously more parameters, so accuracy is sometimes at least not always but for some tasks accuracy is probably better.

Speaker 4: 29:00

If you have followed some of the posts that I have done, I've done, for example, some tests on auto-generating JavaScript complex scripts for creating animations or something like that, and for, for example, these tasks, probably the full model is more accurate With the distilled model, so the local model is less accurate With the distilled model, so the local model is a bit less accurate and you need to more turn up the prompt in order to have the same result. But here, for example, for speaking for interaction with business center, for example, or for creating agents or something like that, these models are absolutely comparable to the online model, with the advantage that you don't pay nothing, that you can deploy it offline with also a reasonable amount of RAM. It depends of the number of parameters that the model has. So this number that appears here is the number of parameters that this model has. So, for example, this is 70 billion parameters. This is 32 billion parameters.

Speaker 4: 30:21

This for example is the model that I used and I'm still using for my tests with DeepSeq.

Speaker 2: 30:29

Which model of DeepSeq are you using for your tests?

Speaker 4: 30:32

32 billion parameters. Here's a distillation of DeepSeq using 32 billion parameters and this works absolutely great.

Speaker 1: 30:44

But how do you tell which like? If you look at the 32 billion parameters, like you're running it clearly on a MacBook.

Speaker 4: 30:52

Yes.

Speaker 1: 30:53

And how do you know if your MacBook will handle that?

Speaker 4: 31:03

To know if the local machine can handle that you can. The number of parameters there's a calculation that gives you the rough estimate of gigabytes of RAM that you can use in order to run these parameters. Very, very rough number is if you multiply this number. Multiply this number by 1.5, for example, is usually a large estimate of the number of gigabytes that you need to run.

Speaker 2: 31:47

So you multiply the number of parameters by 1.5, or which number Exactly this gives you about the number of gigabytes that these?

Speaker 4: 31:54

1.2, 1.5, 1.5 if you want to stay large. This is the number of gigabytes required to efficiently run this model locally. So, for example, this requires to have at least 40 gigabytes of RAM to run locally.

Speaker 1: 32:19

Okay, oh, wow Okay.

Speaker 4: 32:21

If you have strict requirements or, I don't know, if you have, for example, a local machine with 16 GB of RAM. Probably this is the model to use.

Speaker 2: 32:32

So if you have 32 billion parameters, you multiply that by 1.5, roughly in the SL, again 1.2, 1.5, so that's where you get the 40. So it's not 32 billion times 1.5, it's 32. So it's a number of billions. Okay, to be clear.

Speaker 4: 32:49

There's a more precise number, so more precise calculation that compares not only the big parameters but also other set of parameters. But in my experience, when I have to quickly evaluate if I can use this model on online or or offline or not, uh, taking in into consideration the resources that I have, I use this estimate. So, uh, these multiply per one, one, one point, uh, two, something like that 1.2, something like that 1.5, if I want to stay large, gives me if this model is able to run on my machine or not, 16 gigs 16, 17 gigs for that one.

Speaker 4: 33:31

It can also be run on iPhones. It did take me to a whole different world here.

Speaker 2: 33:38

So you can run this on the phone, but I just want to take it back up a notch before we get my mind. Has this whole list of questions Amazing? So we have a large language model that's in the cloud that went through a distillation process to now run locally, where there's different models, or mini models, I guess you could say, or distilled models that have different parameters where you had mentioned. In some cases, what you may lose is some accuracy In some cases, not always, not always. Now I hear about language models being trained constantly with information on the Internet or trained by different sources, with this being run locally. Does it have all of that information and what happens if the model gets updated? Is that the whole point of having different model versions is it has a different set of data, or if because a different set of parameters.

Speaker 2: 34:39

Let's just say we index the internet for a website. So let's's just say we index Microsoft Learn today and have a model that's focused on Microsoft Learn. They constantly add documents. I now have a local copy of DeepSeek that use that Learn source. How do I get updated information?

Speaker 4: 35:03

Exactly. The main limitation of the local LLMs is that they are periodically refreshed, so it's local. When you have downloaded a local LLM, like for example, here in my machine, I have this set of local LLMs, some from Microsoft, some from Lama and DeepSeq. Let me try to do this. Local LLMs are downloaded with the knowledge of when the vendor releases that model, so, for example, your latest update date, and sometimes they response giving you that you.

Speaker 4: 36:04

For example, 5.4 is not a recent model. It has knowledge, so it has knowledge, public knowledge of facts, internet facts until this date, probably now. I have not updated it yet. Probably if I download a new update you can. It's something like Docker the technology so you can download the model. It creates a local model. Then you can pull again in order to see if there are updates of that model. So when I used this model this is, for example, one of the most powerful, in my opinion, small language models that can run locally. So FI4 from Microsoft.

Speaker 2: 36:48

Which model is that? Again did you say FI4.

Speaker 4: 36:52

Microsoft FI4 is good Microsoft FI4?. Yes, it's this model here.

Speaker 4: 37:00

It's one of the best, in my opinion, models from Microsoft that can run fully offline. So, in terms the probably the main limitation of uh open source and local language model are if you intend it to use as a model that knows internet. So this can be probably the scenarios where they can have the main limitations Because they are created and deployed in a particular way. They know the knowledge until that particular way and then you can download. But honestly, this is absolutely not my scenarios. So my scenario is not having a chat GPT offline. That works perfectly because I can here it fails only if I know internet facts.

Speaker 1: 38:02

So if I know who is the USA president, president, I don't know if it's able to so you're saying that when you're, when you download these small LLMs locally, running locally, does it not have access to the internet at all, or can you tell it to have access to the internet?

Speaker 4: 38:24

Jen, usually the model that runs offline by default has no access to internet. You can enable access to internet, but as default it has no access to internet. So because it's trained with the knowledge when the vendor releases it.

Speaker 1: 38:44

At that time it was published, got it Exactly.

Speaker 4: 38:45

So, for example, if I'm asking DeepSeek, when the vendor released it At that time it was published, got it Okay? So, for example, yeah, if I, if I asking DeepSeek who is the USA president, he's giving me that. As for my last update, joe Biden is the president, because it's not.

Speaker 1: 38:59

Right, october 2023 is the last.

Speaker 4: 39:01

It's an online model so, but so the question is if you want to have a reliable chat GPT probably an offline model sometimes can fail because you need to be sure that it was updated with the latest data coming from internet.

Speaker 2: 39:25

So that's a good point that you make it. It's all a matter of, or it is a matter of, how you're going to use, or what you need to use for the model that you're running locally. Right, I want to get into this and I hope that you publish it someday. How do I install this? But can you train it with your own data as well on a local model?

Speaker 2: 39:49

Yes, so if I had, if I was an organization that had security reasons that had policies for my, my employees, or I had other documents that I wanted to put into the AI so that the members of our team could use the AI to find something simple. So we may have a handbook for an employee, handbook that has the policies for taking time off for holidays, where an employee could just type to the model what are the holidays we have?

Speaker 4: 40:23

Exactly. Here is exactly the point where these models are, in my opinion, interesting. So I think that these models are not extremely interesting if you want to have a chat GPT offline, or at least if you want to have a chat GPT offline. There are scenarios where they are extremely interesting. For example, if I need to ask something for coding, they can give me an answer without going to internet, so I can also use it on an airplane or everywhere I want. Also, from an iPhone, for example, I can use these models. But the second scenario is with company data, and here is where I've spent my last months on and also we have live projects on that using these models, because you can use these models fully locally, without paying nothing and without having access to internet for doing business stuff. So, for example, I at least in my case I don't have customers that ask me to provide an AI solution for going to internet and asking everything they want because there are co-pilots, there's or there's a chat gpt for that. All, all customers that are asking us ai solutions wants ai solution that works with their business. So they want the to have a ai solution that are able to talk with business central ai solution that are able to talk with Business Central, ai solutions that are able to talk with their documents or reasoning, with data coming from their corporate data, and something like that. So these are the AI solutions that are useful for that customer. So business solution, not a general chat. So an offline model is great on that because you can use function calling. You can use every of the feature that you have in one of the online models where something like GPT-4A or something like that so, for example, this model that is very small can be executed in also 60 gigabyte machines. It has the same power of GPT-4A in terms of function calling, agent creation and manipulation, something like that. And this can work completely offline and I can show you some examples completely offline.

Speaker 4: 43:06

And then I can show you some example. So very, very stupid example, but just to show you something, let me move this here. For example, I don't want to go into much details into the code, so take this only as an example, but let me reduce this. So here, for example, I have a very stupid code that uses a local model. So this is my local model running in my local. It uses DeepSeq, so the version of DeepSeq that I previously mentioned in my local, and this is DeepSeq. So the version of DeepSeq that I mentioned previously mentioned in my local environment and in this example here I imagine that I want I am a business center company and I want to have the possibility to pass my data to this model in order to be able to have an AI solution where I can ask something about my data.

Speaker 4: 44:14

If you want to do that in using online models, for example, staying in the Microsoft family you need to, for example, deploy I don't know GPT-4A for having the DLLM, and then you need a vector database and a vector models like text embedding, ada or something like that, because you need to convert data coming from business center to something that the model can understand, and for doing that you need also to have a vector database. Microsoft has Azure AI search for that, and this costs a lot. So this solution can cost not less than $400 per month minimum to have a full RAC solution working with business central data and an online model. The same result of this can also be executed totally offline, and this is a very quick, stupid example. So here I have my model running locally, so it's runs on my machine.

Speaker 4: 45:40

The model is DeepSeq in this case, but you can use one of the available model. I use DeepSeIG here in this example, because the SIG is a reasoning model. Now, one of the latest trends in AI is reasoning models. Reasoning models are models that, before giving you the final response, performs a long reasoning process. They can explain all the steps that they use for reasoning and then they can give you the result. And here I also use the embeddings because I want to pass data, and this is, for example, one of the available embedding models open source models. I use this because it's the smallest. One of the available embedding models open source models. I use this because it's the smallest.

Speaker 2: 46:34

So you have a local language model, DeepSeq, installed. You want to train it on your business central data all local so it doesn't go out to the internet. So you also now need to create or install another model. What was the model? You called that To process or to hold your data.

Speaker 4: 46:49

The model is the embedded model is this I use this, but you can use different, so the embedded model is used to work with your data within the language model that you're using.

Speaker 4: 47:00

Without going to the old steps. Yeah, I'm personally a big fan of this tool, this SDK called Microsoft Semantic Kernel. Microsoft Semantic Kernel is a SDK deployed from Microsoft that permits you to SDK, deployed from Microsoft that permits you to create AI solutions that are independent from the model, plus many other features, but one of the main features is that you can create and it abstracts you the creation of the AI, your AI solution, despite the model, and that you can use. So, with this tool, here I'm creating my service and in this service I'm passing data. Here I put stupid data, but imagine that I pass data from the sales coming from business center. Yeah, I, I simply passed my data.

Speaker 4: 48:06

Just just to provide you an example, uh, as a list of data. So the concept is that I, to the memory of my uh ai model, I need to pass all the data that he needs to know, and this data can be the content of business central tables or summarization of the business central tables, I don't know. Here, just to provide a very easy example, I passed a set of data. So, for example, the sum of the amount of the sales for a customer in a month and then for each customer, the same amount for each of the product category that I I using. So the model now knows that he has a total amount and a total amount for this customer category, uh, this uh item category. So here, each data for each customer. Imagine that this can be your rough business central table, or what you want.

Speaker 4: 49:13

So you could pick the data that you want to load the customer table vendor table, customer ledger all of the whichever specific things that you want your model to know Any specific thing that I want that my model knows. That's the idea and then I can start asking DeepSeq. So here, for example, I run this. It will not be extremely quick because here I use the biggest model I can use for that also a small distillation. So also the 16 billion parameters is okay. But here my models has memorized all this data and now DeepSeq is reasoning.

Speaker 4: 50:03

I don't love to match the reasoning part of DeepSeq because it's long. You can. There's a way also to avoid DeepSeq reasoning. But yeah, I've asked in the program. I've asked the model to give me the sales amount for digital services in 2025. So the models need to go into each customer, retrieve the sales amount for that particular category and do the output. And here is, you can see the reasoning. So my model is responsible for that. First of all, sorry I forgot to mention. I opened this for that reason. When you run that, I can rerun again. You can see. When you run a local model, you will see that your GPU is going to the max, because the local model first uses GPU.

Speaker 1: 51:03

Oh, okay, then memory.

Speaker 4: 51:06

So I will later relaunch the process. You will see that my GPU will go to the top, because the model any LLM uses GPU at max in order to perform reasoning, calculations and so on. Then when GPU is not available, it uses RAM and CPU, but first of all it's GPU that is used. But now you can see that my model has responded. So DeepSeq has done that reasoning. Okay, I need to figure out the total sales amount for Jesus. Blah, blah, blah. It's explaining all this mental reasoning. So first look at Contoso and it retrieved that in these two months. Contoso has done that for digital services, then Adatum only one month Then Kronos and so on. Then it gives you all the explanation.

Speaker 2: 52:04

Okay, now I need to sum and blah, blah, blah and the total result is this so it basically you can see what it's doing to come up with the number when you loaded the data. You only have to load that data one time, correct. Yes, one time so you don't have to do it for each query or each question or each prompt Data.

Speaker 3: 52:25

do it for each query or each question or each prompt you can. So if we had a business, we had a business central database.

Speaker 2: 52:29

we could, in essence, in your example, load the sales Every day. We could export or import, however you phrase it the sales information into our language model. So now it has up-to-date sales data. So anytime we run this it will have the most accurate information. Exactly, oh, that's excellent.

Speaker 4: 52:53

And as a data store, so data store for these embeddings. Now you can have different type of embedding. For example, microsoft now has released the support for embeddings also in SQL Server or Azure SQL, and Azure SQL is absolutely a good choice in terms of money if you want to use also the online version, because having embeddings in Azure AI Search or in Azure Sequence there are Azure Search is very costly. Why Azure Sequence is absolutely cheaper than that. But here, just to show that here is I have asked a question to my LLM running locally about a set of data that I have done and he has done reasoning and he has provided me a result. So this can be useful if you want, for example, to have a service that is able to analyze your business central data and gives you the query according to the user question.

Speaker 2: 54:04

I can't wait to play with this. I'm calling you later and we're going to set this up on my machine, but once you have Just to show what I forgot to mention before.

Speaker 4: 54:15

If I do that again you will see. Yeah, so during the process of reasoning, so during the process of reasoning of your local machine is increasing. So imagine a data center. What happens? So data center, I read, the latest data center in US. The consumption of energy in US data center, if I remember, is consumes the. The consumption of energy in the US data center, if I remember, consumes 13% of the energy in the US. All the power that we have in the data center. So what? The main?

Speaker 2: 55:22

What is that that you're running that shows the graph of the usage the GPU, and so the tool that you're running that shows the graph of the usage the GPU, yeah, it's called MacTop.

Speaker 4: 55:34

It's this tool. Let me open it, browse. I use this. There are different. I use this. It's an open source. You can simply this resource monitoring for Mac. Yes, it's a resource monitoring. It's quite useful.

Speaker 2: 55:56

So you're using an open source resource monitoring tool for a Mac.

Speaker 4: 55:59

Yes, it's open source, absolutely, that's good.

Speaker 2: 56:01

This is excellent.

Speaker 4: 56:02

You can easily install with this.

Speaker 2: 56:06

So we install our language model. We can I use the word export, but we can send the data to the language model from our business central environment or anything else, any other data that we want to send to it. The model will learn the data, train the data, the data you can ask in this case, deep seek a prompt. It will show you the reason. I like that so you can see exactly what it's doing to come up with the calculation. And now we have the result. So now we're doing this completely offline. So those that have questions of security, of data being transmitted in the cloud somewhere or teaching a model that somebody else could potentially get the data we eliminated that because this doesn't go out to the Internet.

Speaker 2: 56:53

Now that we have that language model installed locally. Can we use it with Business Central itself? So Business Central with the newer versions has Copilot where we can prompt or ask questions and it will do things. Has co -pilot where we can prompt or ask questions and it will do things. Is there a way that we could use our local model? Within. Business.

Speaker 4: 57:10

Central to get that information. Every local model, local model, in my opinion, are suitable for some types of scenarios. So Business Central, every local model, first of all, as you can see from here, so every local model is available as a local service. So it runs as a service from your local machine or your machine in your local network and you can use with the same APIs as the online model. So if I use DeepSeq offline, it's exactly like using DeepSeq online. If I use VI4, the Microsoft offline one of the Microsoft offline model is the same as using GPT-4A online. So in terms of API calls and so on, obviously a local model is local because it runs in your local network. So Business Central Online, calling directly a local model, you should expose this to Business Central Online. So this is honestly, you can do that, not maybe directly, but with a middle layer in the alpha that is able to cook to from the center. You call something like an Azure function and then an Azure function can call your local service. This is absolutely available, possible Azure function can expose this in a virtual network. So in order to have the security of the messages. But and this is possible so Business Central can call a local model, but you need something in order to expose the local service to Business Central. If you want to have something like a local a copilot inside Business Central using If you want to have something like a local eco-pilot inside Business Central using a local model.

Speaker 4: 59:10

Honestly, my scenario that I at the moment used in real projects are opposite. So it's a local model that needs to interact with Business Central. So I need to. The scenario is I am a company that I have business center online, but for my AI solutions I want to have AI solutions that runs offline. So my AI solution is offline but needs to interact with business center in some ways. So, for example, we have AI solutions that to reach in projects that we have done with customers is there's a customer that is working in the manufacturing industry. They want to have in the production departments they cannot use browser but for different reasons and they want to have the possibility to have a chat that is able to work with business central data. So an example is I am in the production machine and I want to know where this item is used in my production order. I can directly open my console and typing where is this item used in my production orders and then the local model called Business Central and can give the response. That's helpful.

Speaker 1: 1:00:48

So they don't have to go to Business Central, right? They just ask local, exactly, or?

Speaker 4: 1:00:52

something like can you set the inventory to design into five pieces? Can you move the starting date of this production order to tomorrow? And we have a solution for that, fully running locally that permits you to interact with your production orders, manufacturing inventory movements, something like that fully offline.

Speaker 2: 1:01:18

So the language model that you are talking about, or what you have set up, is not only learning on the Business Central data but it's interacting with Business Central to where it's updating information in Business Central.

Speaker 4: 1:01:34

Exactly, all locally, exactly. Another example that I have here in my machine that I can maybe quickly show is that we have in no sorry, not this, but that we have in sorry, not this, but that we have recently deployed in a solution is one second. Is this One second? I need to open the window. Okay, this is, for example.

Speaker 4: 1:02:15

We have some scenarios where we need to do image recognition and, for example, we have a customer that asks us to have the possibility to recognize if a warehouse is something like this, it's not needed. Yes, something like this. So, picture of the warehouses taken from the camera, and they want to know if the warehouse is over a certain level of fields in order to block the possibility to do pickings, so to put the ways on that locations. So what happens in this scenario is that there are some cameras on this warehouse that takes the picture every X minutes of these warehouses. They store the camera images into a folder, in this case, and then here we have a local model, lamavision. In this case, lamavision is a powerful local model, open source model, that is able to do image recognition, ocr, something like that. Offline too, right, offline.

Speaker 2: 1:03:44

Oh, I want to set something up to analyze all of my, I have like 60,000 photos that I had taken over the course of my life. I wonder if I could use the language model to organize them for me. Yes, absolutely oh.

Speaker 4: 1:03:59

It's possible.

Speaker 2: 1:04:00

yes, I'm emailing you.

Speaker 4: 1:04:01

We're setting up a date, I'll send you some wine If I we're setting up a date. Oh yeah, for example, if I launch this application, it starts analyzing I'm hoping to have not change the parameters so it starts analyzing the photos. Can you see here that is going to? Cpu is going to the max, so GPU is going to the max because images are processed. And what happens here is that, yeah, it's analyzing my warehouse images. The prompt that I have under that is analysis image. Try to recognize the level of fill. The fill level of this warehouse Gives me a JSON response. This is the JSON response with a fill level and we store that field level inside the beans in Business Central. So the location in Business Central. So the model is first analyzing the image locally and then calls Business Central in order to update a field that we have in the location card.

Speaker 1: 1:05:07

Wow, so make recommendations for you.

Speaker 4: 1:05:12

In order. Exactly. This is a local service or a local agent that is able to periodically analyze the image coming from the warehouse and store the data in business center In order to. In business center we have an extension that blocks the possibility to do put away in certain locations that are filled over a certain level. So this is handled by an agent running automatically that every time checks that camera images. Handled by an agent running automatically that every time checks that camera images, analyze the images and blocks. Another example here that we are not yet deployed this is deployed in LIME environment. Another example related to that is, for example, that we are trying testing at the moment is relating to object counting. So we have customers that do that.

Speaker 1: 1:06:16

Oh, and it counts that.

Speaker 4: 1:06:17

So we have customers that sell apples, and each apple must be placed into a box.

Speaker 1: 1:06:28

And how many apples can you fit in the box Exactly?

Speaker 4: 1:06:31

And this box contains apples, and we are testing a local agent that scans every box of apples and returns the content, so it takes an image, a picture Exactly every box of apples and returns the content.

Speaker 4: 1:06:49

See it takes an image, a picture as you Exactly it takes the pictures here. If I have now, it's just the text, but if I have this agent that every time now you can see this working starts analyzing each image and gives me the count of the number of the apples that there are in this image in a JSON format that I can use in order to do actions.

Speaker 1: 1:07:18

Wow, that's so cool. This is amazing. So your own local.

Speaker 2: 1:07:21

So, as you can see, here I have a description and account it's well, this it the impressive thing here, besides it being local and not having to use a cloud service which may have cost or if you're working with sensitive data. But these are just additional practical uses of ai within a business, or even a business central implementation where you can easily see in this your scenario where you're counting apples, where you may have had an individual have to count those before now, you can use AI to count those, or even managing your warehouse without sending someone out to see now.

Speaker 2: 1:08:02

AI can analyze your warehouse and tell you.

Speaker 4: 1:08:08

It's an autonomous agent that can work when you want and.

Speaker 2: 1:08:17

I'm sold. This to me has opened up a lot of thought. Even geez in my house, I could use this in my house to do stuff.

Speaker 1: 1:08:29

Quick question on the inventory Can you use these mini LLMs to do maybe even forecasting?

Speaker 4: 1:08:39

Yes, you can absolutely do. There are LLMs that are good on that. Wow, deepseek, for example, is good on that.

Speaker 1: 1:08:51

So you can have your own local LLMs.

Speaker 4: 1:08:54

If you pass, obviously the LLM, as in the previous example, so the LLM needs to know the knowledge. So if you pass, for example, your I don't know purchase order, sales order or Item ledger entries.

Speaker 1: 1:09:11

Item ledger entries, something like that.

Speaker 4: 1:09:14

If you pass that to the model, the model is able to reason on that and it can analyze your trends and gives you the response. That's amazing because you know how many times where people want to do that. It absolutely works.

Speaker 2: 1:09:31

There are so many practical uses of this with the different models. I'm speechless in a sense. I can see so many different uses of it because now we can interact with business central data bi-directionally. So you're getting information from a JSON in a JSON format that you can send back and update business central, but you can also teach it on the data. Yeah, and it's all local, so it's secure.

Speaker 4: 1:09:58

It's pretty local.

Speaker 1: 1:10:01

Say that, chris, so it's more conversational now, versus just looking at numbers and then like trying to figure out okay, this is, this is what's recommending. Now you can to like I'm thinking ahead a little bit here where you can use this tool to make the recommendation and forecast and possibly perhaps you can send that information back to Business Central based on the results. Exactly that's crazy.

Speaker 4: 1:10:30

Yeah, the power of Asvering, so forcing the model to not just Asvering text but Asvering in a structural format. You can in the prompt you can say the model I want always the response in this particular format is powerful because you can then extract data from the response and do actions Like in this example we update the content, we update the location card and something like that. I have, for example, here another example that I'm currently testing, for example, in our company we have I think for you is the same we have a lot of customers, business center, online customers deployed on different tenants and sometimes when we update an app, one of the apps that we have on AppSource we have quite a large set of apps in AppSource we would like to update those apps also to the online customers immediately, because maybe we have a fix or something like that, and sometimes this requires minimum standard. Standard is going to each of the tenant in the admin center and the app. Otherwise, you can use APIs for that in order to spread the apps to everywhere, but APIs are something that are not so at the end of everywhere. So our consultants, for example, are not useful to use automation APIs or something like that in order to update the apps.

Speaker 4: 1:12:24

So here we are testing an agent for that, an AI agent, and here there's a set of AI agents that are able to talk with our consultant, asking what they want to do and providing actions. So, yeah, just very quickly to show, because it's a prototype at the moment, but yeah, we have different agents, so a team of agents working together and the team of agents is there's a what I call here is a customer support agent. That is the agent that is responsible to talk with my consultant. There's a manager that is responsible to decide if an action can be done and there's what I call a PowerShell developer. That is the agent that is responsible to do the actions. So, just to show you something here, if I run this agent, okay, I have here a customer support agent that is talking to me and this give you.

Speaker 4: 1:13:39

Okay, hello, blah, blah, blah. May I kindly ask if you have business center app that you would like to update today? If so, please provide the app ID and the tenant ID. If you would like to update all apps, so update all apps in a given tenant. Please only provide the tenant ID. Okay, yeah, I can write, let me.

Speaker 2: 1:14:03

So you designed this agent and you told it to create the prompt or the question to the consultant to answer Exactly.

Speaker 4: 1:14:13

Here is the agent that is. I've made my prompt later I will show you is just simply a question to ask. Politely, ask to my agent what you want to do. I've given the instruction that if the consultant wants to update an app, he needs to provide the app ID and the tenant ID. If he wants to update all the apps in the tenant, he needs to provide the tenant ID and not the app ID and you have different agents within the agent working together.

Speaker 1: 1:14:46

yeah, wow, so that goes back to where we're having a conversation brad like, remember how, how it's different agents doing specific tasks, and this is a perfect example where it's calling all the different agents say you need to work together to do this specific task well, you have an agent manager, right that? Is so amazing.

Speaker 2: 1:15:06

So you have agents that have specific functions, and then you have an agent that manages the agents and uses the specific agents.

Speaker 4: 1:15:15

Yeah, it's exactly like this. So, yeah, if I put, for example, this if I ask update, app this and I forgot to insert the tenant ID, the manager. The manager asked to the customer support to talk to the customer, that the tenant ID must be provided. And then the customer support agent asked me okay, thank you for providing the ID. In order to proceed with updates, could you please provide the tenant ID? That's so crazy. And I put another GUID, for example. Let me copy another.

Speaker 1: 1:16:00

GUID. I'm so excited about this. This is a perfect showing of how agents work together.

Speaker 4: 1:16:06

Okay, now I provided this. The manager analyze. Okay, everything provided. Now the PowerShell executor is called and now there's a third agent that updates the app. Here is a call to admin center API is done via function calling pass the tenant ID and the app ID. A call to admin center APIs down via function calling pass the tenant ID and the app ID. So there are three agents that works together in order to make a task when a customer support is responsible to ask what I need to do, complete an action. The manager is responsible to involve each agent according to the task and the PowerShell is for me is an agent.

Speaker 4: 1:16:53

This is a perfect illustration.

Speaker 1: 1:16:54

Perfect illustration of how, what the future is gonna be, with different agents doing specific tasks.

Speaker 2: 1:17:01

This is amazing. I mean we've gone through.

Speaker 4: 1:17:05

Without a GUID, the manager as you can see, that is the agent, the model here in this case is able to recognize that this is not a valid GUID. So the manager is okay, customer support told to the customer and say to him that the GUID is not correct, and here the customer support say to me that please ensure that both should be valid GUID. So here is an example of interaction between agents and this can be useful in order to, for example, provide a user interface for consultants in order to update apps on tenants.

Speaker 2: 1:17:54

This is mind-numbing to me. I can see so many different practical uses of this, so let's take it back If somebody wanted to work with this. So let's just take a sequence of steps which I keep telling you. I'm calling you later and we're going to set this up on one of my. I use a mac. I use parallel, so I'll create a mac vm and we'll set all this stuff up. What are the steps that someone has to go through? So the first thing is they have to determine which model they want to use correct, exactly. The first is determine which model you want to use correct Exactly.

Speaker 4: 1:18:26

The first is determine which model you want to use. Based on your scenario.

Speaker 4: 1:18:33

And starting point. So first of all let me go a step back. The first is my opinion. The first is okay.

Speaker 4: 1:18:48

If you want to run a local model, first of all select the platform to host your local model, and there are different platforms to host local models, some more complex, some less complex. I honestly suggest using Ollama. Ollama is a great platform for hosting local models. You simply download Ollama for Windows, for Linux, for MacOS, and when you have Ollama downloaded, simply Ollama has a set of models here, the same models that I previously showed, divided for vendors. For each model there are. If a model is unique, like Microsoft VIE4, there's only this model to download and simply write olamapool, vie4, and it downloads you the model locally. If you have a more complex model, like DeepSeq, you can download one of the available distillation of the models starting from this. That is the big DeepSeq, the biggest available in Ollama that can be run locally to the smallest I've previously used this, so deep seek simply run this and your model is up to be executed into your local machine and available as a local service.

Speaker 4: 1:20:46

If you don't want to use Ollama, there's LMS Studio. That is another available tool for running local model. Lms Studio is much more user-friendly because Ollama has no user interface. Ollama runs as a service like I have here, or LAMA runs as a service like I have here. Lm Studio instead has a user interface that you can chat with the model. It's something like more user-friendly. Otherwise, there are other tools like this, like this.

Speaker 4: 1:21:28

This Lama CPP is another tool available to run local models. I don't remember where is the repo, if it is yes, it is. You can download this tool and run easily with a simple command, one on the model, with using this command minus m name of your model or URL of your model that you can download from here. The URL of this that you have appears here. Or you can launch a server. I, honestly, all my sample that I use are using Ollama. It's easy and it's powerful. When you have the platform, you can then decide the model Our model to use. Obviously, it depends on your needs. Sometimes you need a lot of power, like DeepSeq is able to, for example, doing reasoning. So if you have something like needs to do advanced reasoning like, for example, I have to create a forecasting application probably DeepSeq is better because it can do complex reasoning. More parameters.

Speaker 4: 1:22:58

Exactly Parameters. Yes, the model of the parameters depends obviously on your local resources. So download accordingly to your local resources. So if I have, for example, I don't know 60 GB of machines, probably here is my limit I cannot download these because otherwise it will be too slow to have a response. But these are absolutely tests that you can do. So you can download the model, try. If it's too slow, go to the smallest version.

Speaker 4: 1:23:45

And my personal experience, deepseq is a great model for advanced reasoning. So if you have, if you require advanced reasoning, general text question or code generation, deepseq is good. In the way the open source family, my favorite models in absolute are these Lama 3.3 for me, is one of the models that is able to satisfy every need I have today, especially when working with Business Central. Is able to perform function calling, is able to do, honestly, quite everything. It's not a reasoning model. So if you require complex reasoning, deepseq is better, but for every other task, lama 3.3 is great. Otherwise, my required choice is FII4 from Microsoft. That is another great OpenSUSE model, honestly quite comparable to what the result that you have in GTP, gpt-48 in many fields, and these are also listed here in this order because they are the most downloaded.

Speaker 2: 1:25:05

OpenSuite models Okay so we take a platform, we take a model, we install it and we're up and running, basically.

Speaker 4: 1:25:16

You are up and running. Obviously your model is up and running. You can use your model like a local chat, like I've done here. So here I have all my local models and I can select one and start using the local model as a chat.

Speaker 2: 1:25:34

What are you using to connect to? Which application are you using to connect to your models?

Speaker 4: 1:25:39

This is another open source application called.

Speaker 1: 1:25:45

This MSDF.

Speaker 4: 1:25:48

If you want to have a user interface Otherwise, via command line, you can. Every model offers the command line interface to interact. So when you download the model, the model starts and then from command line you can start typing and the model answers. Model starts and then from command line you can start typing and the model answers. If you want something more user friendly, a local user interface is required. I use this because it permits. This is an open source user interface that is able to work with local models.

Speaker 2: 1:26:27

What is the name of it? Again, misty, misty.

Speaker 1: 1:26:32

M-S-T-Y.

Speaker 4: 1:26:37

M-S-T-Y Exactly, so you can download for the platform you want and automatically it discovers if you have downloaded local models and all your local models are available here and you can also add online providers. So if you have I don't know if you have an account with OpenAI, an account with DicOnline and so on you can also use the model from here.

Speaker 1: 1:27:08

So then you have a local desktop application.

Speaker 4: 1:27:13

Exactly. I always use this because it's useful for testing, for example, if you want to test a prompt or something like that it is nice for testing.

Speaker 2: 1:27:24

Then, instead of for creating application, application are creating in code in my case, so I have some so we now have an interface to the model via command prompt or via a tool, and then as far as sending our data to it, does that vary from model to model on how to do that?

Speaker 4: 1:27:49

Sending data to model is not related to model, or for sending data to model, you have essentially two ways. First of all, you can use the REST APIs exposed by the model itself when you download the model. The model is available, as I previously showed, as a service, local service, so you can use REST APIs to talk with the model and these possible scenarios. But in this case you need to know the format of each REST APIs of each model. Usually they're quite the same, but you need to know the format of these models. It's always explained if you go on Hugging Face Hugging Face is the main portal for open source models Each model has the explanation of their APIs.

Speaker 4: 1:28:48

I honestly don't never do that. That's why I previously show you my examples here. Always use use example that I show. Also use abstraction tools, like, for example, here I'm using a semantic kernel. A semantic kernel is an abstraction tool, so with semantic kernel, I don't need to take care of knowing the rest API that I need to use with GPT-48, with DeepSeq, with OpenAI or something like that, because it does that for me, with the deep seek, with the open AI or something like that, because it does that for me.

Speaker 2: 1:29:32

So you downloaded Samantha Kernel and you installed Samantha Kernel and that interfaces with your local model.

Speaker 4: 1:29:36

Exactly this is when creating advanced solutions and you don't want to rely on REST APIs. It's a recommended approach because this solution can be easily swapped between different providers and, honestly, when I create an AI solution or an AI agent or something like that, I would like to be able to use different providers Also. The previous example that I show, where I show a solution where three agents work together, I would like also to have the freedom to have previously in my solution there was three agents the manager, the customer support and the PowerShell executor. I can, in that solution, I can say that the PowerShell executor uses GPT-4a, while the customer support only uses GPT-3.5 because it costs less. So I can spread the model across agents also, and so creating a AI solution that are platform agnostic sometimes are great, because your same solution can be executed with the platform and if, a month later, I want to change the platform, I can change that easily.

Speaker 4: 1:31:11

For example, one example was DeepSeek Online. Deepseek Online, when it was released, was the cheapest model of the history, so DeepSeek costs really quite nothing and it's very powerful. So compared, for example, to Microsoft's GPT-4A, gpt-4a costs 10,000 more for each call compared to DeepSeq. We have some solution deployed lots of months ago, when DeepSeq was not available, that simply by changing the parameters and changing that parameter to DeepSeq works without changing nothing.

Speaker 2: 1:32:01

So that's the key then. So it's to install the model, choose the platform, install the model semantic the platform, install the model semantic kernel, and then you're on your way, and then as you had just mentioned Exactly, you can use totally cross-platform applications.

Speaker 2: 1:32:16

So you're not tied to a model. At that point, the semantic kernel will communicate with the model. You just tell it which model to use. So in your case, as you had mentioned, you had started with one model. A new model was released and, simply by changing which model to use, your application was still functional using the new model exactly this is great.

Speaker 4: 1:32:39

This is great exactly, it's not a platform Gnostic, and that meets you too.

Speaker 2: 1:32:45

And my favorite part is this all is running on a Mac.

Speaker 4: 1:32:49

And this runs on a Mac, I'm obviously I love Windows but honestly, for AI staffs, the Macs have something more. The.

Speaker 2: 1:33:02

Macs. Listen, I like Windows too, don't get me wrong, but the Macs always have something more, and I'm thankful that we can communicate with Business Central with VS Code, Especially for.

Speaker 4: 1:33:11

AI stuff. The Mac has lots of more power than compared to Windows.

Speaker 1: 1:33:19

I'm glad you said that. Yeah, can you repeat that again? We're best friends.

Speaker 2: 1:33:26

Well, Stefano, you had blown my mind this is amazing.

Speaker 1: 1:33:30

I just downloaded MST, by the way, just so that I can interact with this stuff.

Speaker 2: 1:33:35

I'm just telling you I hope you're not going to bed soon because I'm going to send you a text with a question asking for all these links and a meeting. So just give me a few minutes. This is amazing. You've covered so much and you've inspired I know me and I'm sure, anyone listening to see how you could utilize a local language model or AI in that sense, there are lots of scenarios where this fits. It fits everywhere. Just your scenarios of the warehouse the apples, the agents.

Speaker 1: 1:34:15

The vision.

Speaker 2: 1:34:16

It's just to show that you've just, in those examples that you've given us, have crossed many areas, how you could use AI to gain efficiency within an implementation, and I think it's wonderful. It's amazing and I'm sort of speechless because my mind is thinking of all the applications of this.

Speaker 4: 1:34:40

Yes, I think it fits, especially when talking about the term that now is a lot of topic today. So the agentic AI features because, especially in the business central world, we are always so the most common features are I click an action on business central and this action does, calls an LLM and does something, so it's user-centric. Here we are moving a step over of that. So local LLMs, in my opinion, are extremely powerful when you want to create agents that work with business center. So I am a company and inside my company, I want to have offline agents that does actions or decision or something like that. Also we business center data. So or do actions inside business center data, like in this example stupid example, but they can, I think, can give the idea. So these are local applications running autonomously and also maybe in teams, in team teams, not teams the application, but teams in terms of groups of Organization teams yeah, Exactly, you can have multiple agents that work together in order to achieve a task.

Speaker 1: 1:36:07

I think it eases the minds of organizations or businesses where they may be afraid of using LLMs online and they want to maintain their data within their organization. This, right here, is a game changer of seeing a good example of use of local LLMs.

Speaker 2: 1:36:26

It's not only the security concerns of sharing sensitive data, and I use the word sensitive meaning anything that someone feels they don't want to share with someone else. It doesn't have to be sensitive in the sense of identification If I don't want to share my sales, for example, but it's also a way to help control your cost, so it's a-factor you have a fixed cost in a sense, because you have the machine or the hardware to run it.

Speaker 2: 1:36:52

Yes, if you have the right hardware, but that's a fixed cost in a sense, outside of the electricity to power that hardware, whereas with some of these other models, depending upon how much you use it, your cost could fluctuate or vary. Where this gives you a fixed cost and you have control of the data, um, I don't, I think I don't even know what to say anymore. My mind is full of all of this and now I have a greater appreciation, uh, for all of the things that you've been sharing and posting about local language models or running language models locally, large language models locally, see I thought it's not just locally in your machine.

Speaker 1: 1:37:33

You can I mean you could technically have this on azure. It just means it's offline, right? It just right so you could put in a virtual machine. It just means that you don't need to give it access to the online world yes absolutely well, mr stefano.

Speaker 2: 1:37:52

Thank you, I I was sold.

Speaker 4: 1:37:55

He had me a hello, as they say. I know that for you is evening, so having this year no, this, this is great.

Speaker 2: 1:38:04

Thank you very much for taking the time to speak with us. This was wonderful. You shared so much information to help break down what running a local running a large language model locally entails, and also extremely valuable scenarios on how it can be applied. If anyone would like to contact you, has additional questions or may want to learn more about large language models and or see some of the other great things that you've done, what is the best way to contact you?

Speaker 4: 1:38:39

I'm always available on LinkedIn or on X or on Blue Sky. You can reach me directly on that social it's probably the best or directly from the contact of my website. It's the best way to reach me directly. I always ask there to as many of you know. I'm always available to ask there, so feel free to contact me if you have follow-ups.

Speaker 2: 1:39:11

Excellent, excellent, and I definitely would like to have you back on again to follow up to this, because, seeing all the great things that you've been doing, I can only imagine where you'll be in a few months. So we'll have to see if we can get you on later in the year to see where you have taken some of this. Are you going to directions in North America?

Speaker 4: 1:39:31

Unfortunately I will skip directions in North America this year I will be at the Dynamics Mines and we are organizing the Business Central Day event in Italy with directions. We have a lot of work in order to be able to do this event and it is extremely near to Direction NA. My initial plan was to go with Duilio to do the session that we have done in Direction NA about large customers. This was a very appreciated session and we would like to repeat that session to Direction NA, but when we started the organization of DirectionNA, but when we started the organization of Direction Italy. Unfortunately we are forced to have a fixed date by Microsoft Italy because they give us the headquarter for the event and so it's extremely near to direction and for me it's not possible to be outside my company. It's so large.

Speaker 2: 1:40:54

We understand that. We ourselves run into the challenges of which conferences, of which events to attend, because there are many and, as we talked about, there's some travel considerations as well.

Speaker 4: 1:41:05

The problem is that sometimes these events are really near each other. So when you have a get a major choice that requires uh. So sometimes my company is flexible to permit me to go a week outside for events about two weeks uh when you're doing all this great stuff.

Speaker 2: 1:41:27

I can see that. That's okay. We'll have pizza with willio again.

Speaker 4: 1:41:30

So it does, it's, it's uh, we, I will be, we will be us for sure in this year. Uh, it's a promise that I've done with willio. If not, the ration na may be, uh, the other direction US, or something like that, but we will do it.

Speaker 2: 1:41:46

Well, there's Days of Knowledge, we'll do it. And then Summit is in October. And the call for speakers opened up for that, and that's in October.

Speaker 4: 1:41:54

We are planning to go in one of that.

Speaker 1: 1:41:58

I'll be looking forward to see you in person.

Speaker 2: 1:42:01

Yeah yeah. That would be excellent. Well, sir, thank you very much.

Speaker 4: 1:42:04

We appreciate you taking the time with us, and I look forward to speaking with you soon and, as always, for your great podcast and your great initiative you are doing. Thank you, thank you very much.

Speaker 2: 1:42:14

We appreciate you. Thank you very much, sir.

Speaker 1: 1:42:16

Thank you, stefan. All right, ciao, ciao, bye-bye.

Speaker 2: 1:42:19

Ciao, bye-bye. Thank you, chris, for your time for another episode of In the Dynamics Corner Chair, and thank you to our guests for participating.

Speaker 1: 1:42:29

Thank you, brad, for your time. It is a wonderful episode of Dynamics Corner Chair. I would also like to thank our guests for joining us. Thank you for all of our listeners tuning in as well. You can find Brad at developerlifecom, that is D-V-L-P-R-L-I-F-E dot com, and you can interact with them via Twitter D-V-L-P-R-L-I-F-E. You can also find me at matalinoio, m-a-t-a-l-i-n-o dot I-O, and my Twitter handle is Mattelino16. And you can see those links down below in the show notes. Again, thank you everyone. Thank you and take care.

People on this episode

Kris

Host

Brad

Co-host