Jan AI Review: Local-First LLMs for Regulated Businesses
Posted: December 31, 1969 to AI.

Petronella Technology Group spends a lot of time sitting across the table from law firms, medical practices, defense contractors, engineering shops, and financial advisors who want the same productivity gains everyone else is bragging about with ChatGPT, but who legitimately cannot paste client files into a consumer chatbot. The client-intake memo, the scanned lab result, the NIST 800-171 remediation plan, the case strategy note, the tax return, the code with hard-coded secrets. None of that belongs in a third-party cloud prompt.
For the past two years, that has been the single hardest question to answer cleanly. If the answer is "just pay for ChatGPT Enterprise and trust the zero-data-retention toggle," some clients hear you out and some clients walk. If the answer is "stand up a private AI cluster with a 4U GPU chassis in your server closet," some clients are ready and most are not.
There is a middle step that more people should be using, and this post is a long, honest look at it. The product is called Jan AI. It is open source, it runs the model on the same laptop or server that opens it, and when you configure it correctly it never touches the public internet. We have shipped it on staff devices and we have replaced it with private clusters when the workload outgrew the laptop. Both outcomes are legitimate. Neither one fits every business.
If you would rather skip the reading and talk it through with someone who has done this migration about a dozen times now, call Petronella at (919) 348-4912 or drop a note through /contact-us/. Otherwise, keep going.
What Jan AI Actually Is
Jan is a desktop application that downloads and runs large language models locally on your own hardware. You click install, pick a model from a catalog of open-weight options (Llama, Qwen, Gemma, GPT-oss, DeepSeek, Mistral, and others), and the model lives on your laptop or desktop the same way Photoshop or Excel lives on your laptop or desktop. The chat window looks like ChatGPT. The backend is llama.cpp with a Tauri-based wrapper in TypeScript, and the UI is straightforward enough that non-technical staff stop asking questions after about ten minutes.
The project is built and maintained by Menlo Research. At the time this post was written the current release is v0.7.9 (published 2026-03-23), the codebase sits at 41,899 GitHub stars and 2,778 forks, and the license is Apache 2.0 which matters a lot for business use because it permits commercial deployment with attribution. Source and release data are at the janhq/jan repository on GitHub and the marketing site is jan.ai.
The headline features, straight from the project README:
- Local AI models pulled directly from HuggingFace (Llama, Gemma, Qwen, GPT-oss, and anything else in GGUF format).
- Optional cloud bridges to OpenAI, Anthropic, Mistral, Groq, and MiniMax when you want a hybrid workflow.
- Custom assistants with their own system prompts and tool configurations.
- OpenAI-compatible API on
localhost:1337so any tool built for the OpenAI SDK talks to Jan with one base-URL change. - Model Context Protocol (MCP) support so Jan can call tools, read files, and act as an agent.
- Full offline operation when configured that way. No telemetry required.
That last point is the one regulated businesses care about. Jan does not need the internet to work once the model weights are on disk. You can pull the Ethernet cable and it keeps answering questions.
Minimum hardware per the official README: macOS 13.6+ with 8 GB RAM for 3B-parameter models, 16 GB for 7B, 32 GB for 13B. Windows 10+ with NVIDIA, AMD, or Intel Arc GPU support. Most Linux distributions work, with GPU acceleration available. Intel Macs are explicitly supported alongside Apple Silicon, and there is a Microsoft Store build, a Flathub build, a .deb, and an AppImage. Raw source at https://github.com/janhq/jan and Microsoft Store at https://apps.microsoft.com/detail/xpdcnfn5cpzlqb.
Why Local-First Matters for Compliance
Every regulated business we work with runs into the same wall when they first try to use a cloud LLM. The compliance framework says: data of this type cannot leave a controlled boundary without a signed agreement, an audit trail, and usually an accredited third party. OpenAI, Anthropic, and Google all offer enterprise-grade contracts that satisfy a lot of that on paper. But paper is not the only hurdle.
HIPAA requires a Business Associate Agreement before you can use a covered service with Protected Health Information. OpenAI offers BAAs on the ChatGPT Enterprise tier and the API with a specific zero-data-retention configuration. Anthropic offers BAAs on Claude for Enterprise. Getting them in place for a 12-person dental practice is possible but takes time and money, and the BAA only covers specific products, not the free tier the receptionist has been using from her phone.
CMMC is where most defense contractors run into a real block. Controlled Unclassified Information (CUI) cannot move through a service that is not FedRAMP Moderate or equivalent, and the assessor is going to audit every outbound API call your staff has made in the past ninety days. Most consumer-grade LLM traffic is simply out of scope for use with CUI. Microsoft offers GCC High tenants with Copilot that meet CMMC Level 2 requirements at a substantial premium, and that is a valid path. Local inference is the other valid path.
Attorney-client privilege is not a regulation so much as an evidentiary doctrine, and it is more fragile than a lot of attorneys assume. Privilege can be waived by voluntary disclosure to a third party. A court could plausibly decide that pasting client communications into a third-party LLM is a voluntary disclosure, and the hosting provider is the third party. Whether that holds depends on jurisdiction, terms of service, and the specific configuration. The safest answer, the one a well-run firm has been telling partners since 2023, is to keep privileged material out of cloud LLMs until case law settles.
State privacy laws (Colorado, California, Virginia, Connecticut, and the rest) layer on breach-notification duties, processing-agreement requirements, and data-minimization expectations that are not satisfied by a click-through checkbox on a consumer LLM.
Local inference dodges all of this in one move. The data does not leave the device. There is no third party. There is no API call to audit. The model is just a large file on the hard drive, in exactly the same compliance posture as Microsoft Word.
That simplicity is worth paying attention to.
Installing Jan and Running Your First Local Model

The install flow takes about five minutes on reasonable hardware. Download the installer from jan.ai, run it, let it finish. On first launch you get the Hub tab where you pick a model. A 7B model in Q4_K_M quantization runs about 4 GB of disk, an 8B runs about 4.7 GB, and a 70B quant can run 40 GB or more.
Our general first-model recommendation for regulated business use is Qwen 2.5 7B Instruct or Llama 3.1 8B Instruct. Both are well-behaved, both are small enough to run on a 16 GB MacBook or a mid-range Windows laptop with an RTX 3060 or better, and both handle English business writing, summarization, and basic code at a quality close to GPT-3.5. If you need something heavier and you have a machine with 32 GB of RAM or a 24 GB GPU, Qwen 2.5 32B or Llama 3.3 70B in a good quant are realistic choices. If you need speed over smarts, Gemma 2 9B is fast and small.
Once the model finishes downloading you click "Start" and you are chatting. There is a Settings panel where you disable telemetry if you want to be thorough. There is a Privacy section in the docs that spells out what is and is not collected, and by default the application does not send prompts to any server unless you explicitly enable a cloud bridge.
Using Jan as an OpenAI-Compatible API
The feature most developers will care about is the OpenAI-compatible endpoint on localhost:1337. Any tool that speaks the OpenAI API can be pointed at Jan with a single line change. Here is a minimal Python example that runs a completion against a local model:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:1337/v1",
api_key="jan-local-key" # Jan ignores the value, but the SDK requires one
)
response = client.chat.completions.create(
model="llama-3.1-8b-instruct", # must match the model ID shown in Jan's UI
messages=[
{"role": "system", "content": "You are a paralegal assistant. Answer only from the text provided."},
{"role": "user", "content": "Summarize the key deadlines in this engagement letter: ..."}
],
temperature=0.2
)
print(response.choices[0].message.content)
That is the same three lines of code you would run against OpenAI's cloud API. The only thing that changed is the base URL. If you have an existing internal tool, a Slack bot, a CRM plugin, or a custom chatbot built for the OpenAI SDK, you can flip it to Jan without rewriting business logic.
The equivalent curl call, which is useful for testing and for teaching ops teams how to read the protocol:
curl http://localhost:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-7b-instruct",
"messages": [
{"role": "user", "content": "Rewrite this paragraph in plain English."}
],
"temperature": 0.3
}'
If you want to verify that no traffic is leaving the box, run a packet capture while you do this. On a Mac: sudo tcpdump -i en0 host not 127.0.0.1 and not ::1 and port 443. You will see silence. That is the whole point.
Locking Jan Down for Office Use
Out of the box Jan ships with sensible defaults, but for regulated deployments we configure a few additional things on every install:
- Disable the cloud-provider bridges in Settings. There is no reason to have the Anthropic or OpenAI toggle even visible on a staff machine that is supposed to be air-gapped from third-party LLMs. The setting lives under Settings and is one click.
- Disable analytics and crash reporting. Jan does not send prompts, but it does ship a basic analytics ping by default. For HIPAA and CMMC machines we turn it off to simplify the privacy assertion.
- Lock the OpenAI-compatible server to
127.0.0.1only. By default it binds to localhost anyway, but it is worth confirming in the settings. We have seen folks expose it on a LAN for shared use, and that is a separate conversation that needs its own authentication layer. - Pin the model version. Jan will happily pull a newer quant when you upgrade, and for a regulated environment you want the weights pinned to a known hash. Document the version, document the source, and save the GGUF file somewhere you can cite in an audit.
- Document the install as an approved business application in your asset inventory. This is the single step most businesses skip and the single step most auditors want to see.
Done properly, Jan passes a CMMC Level 1 self-assessment and most of the technical controls for HIPAA's technical safeguards without drama. It is not a magic box that makes you compliant, but it does not create new compliance problems the way pasting PHI into ChatGPT does.
Comparing Jan with LM Studio, Ollama, GPT4All, and Open WebUI
Jan is not the only local-first LLM runner. The other names that come up constantly are LM Studio, Ollama, GPT4All, and Open WebUI (paired with a backend like Ollama or llama.cpp). All of them work. They make slightly different tradeoffs.
LM Studio is probably the closest direct competitor. It is a polished desktop app, it has a strong model browser, and it runs on the same underlying stack. The catch is that LM Studio is not open source. The binary is free for personal and commercial use, but the source is closed, the license is custom, and the telemetry behavior is not auditable. For a regulated client who wants the vendor neutrality story clean, "open source, Apache 2.0, source on GitHub" is a stronger audit answer than "free binary from a private company."
Ollama is the choice for developers who want a command-line server rather than a desktop app. It runs as a daemon, it has a clean REST API, and it is very popular for backend workloads on Linux servers. It is a great fit for a private server, not a great fit for a receptionist who wants to summarize intake forms. Ollama's license is MIT, and its UX assumes terminal comfort.
GPT4All was one of the earliest entrants in this space and is still actively developed by Nomic AI. It is open source (MIT license) with a polished desktop UI. It leans more heavily on RAG features for local documents, which is genuinely useful for law firms and medical practices that want to query their own files. GPT4All and Jan are the closest siblings in this space, and the choice between them often comes down to which UI your staff finds less confusing.
Open WebUI is a web-based chat interface that you point at an Ollama or llama.cpp backend. It is powerful for power users, supports multi-user access, has built-in RAG, and has growing MCP support. It requires more setup than Jan or LM Studio, and it is a better fit for a central server with several users than for a single-laptop deployment.
A few honest observations after running all of these in client deployments:
Jan's single biggest advantage in a professional-services context is the combination of a polished desktop UI and a clean OpenAI-compatible endpoint in the same product. You get the friendly chat window for the receptionist and the API for the developer, from one install. With Ollama plus Open WebUI you can get the same outcome, but it is two products, two configurations, and two update cycles.
Jan's biggest weakness is that multi-user deployments are awkward. It is a desktop app. If you want ten staff to use the same model, you either install Jan on each machine (which fragments governance and doubles storage) or you host the model on a central server running Ollama behind Open WebUI and tell Jan to point at that. Which is fine, but it is not what Jan is optimized for.
When Jan AI Is the Right Fit
Based on a couple dozen client deployments, here is when Jan is the answer:
Small practices with one or two knowledge workers who need AI help on confidential material. A solo attorney summarizing depositions. A two-partner CPA firm drafting client memos. A dental practice manager rewriting patient communications. These folks need a tool that runs on the MacBook they already own, that never touches the cloud, and that gets out of their way. Jan installs in five minutes and they never have to think about it again.
Defense contractors piloting AI for non-CUI work before committing to GCC High. We have seen a pattern where a small DoD-adjacent shop wants to figure out whether AI actually helps their business before signing a three-year Microsoft 365 GCC High contract. Jan on a handful of laptops, pointed at public-data use cases only (marketing, internal training material, code that does not touch controlled information), is a clean way to pilot. When the value is proven you upgrade the CUI workflows to a GCC High tenant or a private cluster.
Law firms handling privileged material. Privilege is too valuable to hand to a cloud vendor on terms of service that change every quarter. Jan on the attorneys' own laptops keeps privileged material inside the firm's existing perimeter.
Medical practices without an IT staff large enough to deploy GCC High. A three-physician group is not going to stand up a dedicated Azure tenant. They are going to run Jan on the office manager's workstation for rewriting patient letters and drafting appeals.
Financial advisors and wealth managers under state privacy laws. The compliance cost of cloud LLMs for client-data work is often higher than the labor savings. Local inference on the advisor's own workstation sidesteps the issue.
Individual developers who want to prototype without a cloud bill. The $20-$200 per month you would spend on OpenAI API tokens during prototyping is zero on a laptop. If you are building a tool that will eventually live in the cloud, fine, switch later. If you are building a tool that will eventually live on-premise anyway, you might as well build on the target.
In short, Jan fits when the number of concurrent users is small, the hardware budget per user is reasonable, and the workloads are mostly interactive (chat, drafting, summarization) rather than high-volume batch processing.
When You Have Outgrown Jan and Need a Private Cluster
Jan stops being the right answer when any of the following start happening.
You have more than about ten concurrent users. Each Jan install is a separate model on a separate machine. If you have a 30-person firm, you are running 30 copies of the model, 30 update cycles, 30 places to audit, and 30 RAM bottlenecks. At that point it is cheaper, faster, and more manageable to stand up one or two GPUs in a central chassis and point everyone at a shared inference endpoint.
The models you need are too large for laptops. A laptop with 32 GB of RAM can run a 13B model comfortably and a 30B model slowly. A 70B model at reasonable quant needs a serious GPU and is painful on CPU. A 400B mixture-of-experts model is simply not going to happen on a MacBook. If your use case requires larger models (complex reasoning, long-context document work, code generation at Claude or GPT-4 quality levels), you need a GPU server. Petronella's private AI cluster builds start at configurations that handle 70B models comfortably and scale up to 400B-class models for clients who need them.
You need shared fine-tuning, shared RAG, or shared audit trails. Jan's per-machine architecture means each user has their own chat history, their own uploaded documents, their own custom assistants. If you want the whole firm to share a single RAG index of case files, or a single fine-tune on internal vocabulary, you need a central system. Open WebUI plus Ollama plus a vector database on a shared server is a reasonable self-hosted path. A managed private cluster is a more hands-off path.
You need integration with other systems at scale. A single user poking at Jan's localhost:1337 endpoint is one thing. An entire firm running agents that hit an LLM 50,000 times per day from Microsoft Power Automate, an internal ticketing system, a CRM, and a document-management system is another thing. That workload needs a dedicated GPU server, proper rate limiting, queueing, monitoring, and an incident response plan. Your laptops are not the right place for it.
You need performance guarantees. Jan on a laptop is subject to everything else the laptop is doing. Open a few Chrome tabs and a video call and the model slows down. For professional workflows where the LLM is a critical path (customer support, intake triage, real-time document review), you want a dedicated piece of hardware with its own thermal headroom and its own GPU. That is a private cluster.
You need certified handling for CUI at scale. For small pilot use, Jan plus a well-documented endpoint controller on an air-gapped laptop is a defensible position. For handling CUI across a larger team, you want a purpose-built enclave with documented controls, and that is what Petronella's private cluster offering provides.
Our rule of thumb is that if you have five or fewer knowledge workers, Jan on their individual machines is probably the right starting point. From five to ten users, it depends on the workload and how tightly coupled the collaboration is. Above ten users, or any time CUI is involved at volume, a private cluster becomes the cleaner answer.
Hardware Sizing Reality Check
One of the most common mistakes we see is clients buying a new MacBook Pro for every staff member and believing they have solved the AI problem. They often have, but at a higher unit cost than a shared GPU server and with governance headaches. Here is roughly how the math shakes out.
A MacBook Pro 16-inch with M4 Pro, 48 GB unified memory, and 1 TB SSD is around $3,200. It runs 13B-class models comfortably and struggles with anything much bigger. For a knowledge worker who needs an LLM occasionally through the day, this is a genuinely good setup.
A dedicated inference box with a single NVIDIA RTX 6000 Ada (48 GB VRAM) is around $12,000 to $15,000 all-in for the chassis, CPU, RAM, and disk. It runs a 70B model at fast interactive speeds, serves 10 to 20 concurrent users depending on context length and token rate, and draws maybe 450 watts peak. Over three years that is the cost of four MacBooks Pro, and it handles 10+ users.
A multi-GPU box with four H100 or L40S cards starts around $80,000 and runs 400B-class models for small teams. This is the "we are an actual AI-first firm" budget. You are not buying this as an afterthought.
The inflection point is almost always in the 8-to-15-user range. Below that, laptops with Jan are fine and often cheaper. Above that, a shared inference server pays for itself in hardware cost alone within 18 to 24 months, before you even count the governance benefits.
If you are in that 8-to-15-user middle zone, you do not actually have to make the call today. Start with Jan on laptops, instrument the usage (Jan logs tokens per day per user by default and you can pull those metrics), and review at the six-month mark. If staff are using it daily and the usage is growing, a shared cluster is the next move. If usage is sporadic, laptops are fine indefinitely.
Model Selection for Regulated Work
Which local model you run matters almost as much as the infrastructure it runs on. A short, opinionated guide based on what we have tested:
Llama 3.1 8B Instruct and Llama 3.3 70B Instruct from Meta are the default answers for most English-language business writing. Meta's community license permits commercial use below 700 million monthly active users, which covers essentially every business Petronella works with. Source: https://www.llama.com/llama3_3/license/.
Qwen 2.5 7B / 14B / 32B / 72B Instruct from Alibaba is the other default. Apache 2.0 on the smaller sizes, which is very permissive for commercial use. Qwen is often stronger than Llama at multilingual work and code. Source: https://huggingface.co/Qwen.
DeepSeek V3 and DeepSeek R1 are reasoning-focused models with licenses that are broadly permissive for commercial use. R1 in particular is strong at step-by-step analytical work and is a good fit for compliance reasoning, code review, and long-form problem solving. The trade-off is that R1 is a very large model and typically needs a GPU server, not a laptop. Source: https://github.com/deepseek-ai.
Mistral 7B and Mistral Small remain solid Apache 2.0 licensed choices. Mistral Large is not open weight anymore.
Gemma 2 9B and Gemma 3 from Google are well-behaved, fast, and use a custom license that permits commercial use with restrictions. Read the license before deploying in a commercial context. Source: https://ai.google.dev/gemma/terms.
GPT-oss from OpenAI is a relatively new open-weight family (20B and 120B sizes) with an Apache 2.0 license, making it one of the more attractive options for businesses that want OpenAI-style behavior without the cloud dependency. It is supported natively in Jan's catalog.
The short version: for most regulated business use, Llama 3.1 8B or Qwen 2.5 7B/14B on a laptop is a perfectly reasonable starting point. Upgrade to 70B-class models (Llama 3.3 70B, Qwen 2.5 72B, DeepSeek R1) on a GPU server when the smaller models hit their ceiling for your workflow.
Always double-check the specific license for the specific quant you download. HuggingFace shows the license on every model card, and a quant redistributed by a third party sometimes has different terms than the original weights. This is the one part of model selection that is most often missed in compliance reviews.
Maintenance Overhead: What Ongoing Work Looks Like
The one thing that catches people off guard with local LLMs is that the models keep improving. Llama 3 was released. Then Llama 3.1. Then Llama 3.3. Qwen 2 to Qwen 2.5. GPT-oss arrived. DeepSeek V3 arrived. Each one is meaningfully better than its predecessor. If you install Jan with Llama 3.1 and never touch it for two years, you are going to be running last-generation AI while everyone else is several generations ahead.
Practical maintenance work for a Jan deployment, based on what we do for managed clients:
Monthly: model refresh review. Is there a materially better model in the same size class? If yes, download it, test it, and swap. Jan makes this click-and-go. Document the change.
Monthly: Jan version update. New Jan releases come about every two to four weeks based on v0.7.x cadence. Most are bug fixes and performance improvements. Read the changelog, update, and retest the core workflows.
Quarterly: prompt and assistant review. The system prompts and custom assistants you configured three months ago may not match current workflows. Audit them.
Quarterly: usage review. How much is staff actually using this? Is it driving value? Is the hardware still right-sized? This is where the decision to stay on laptops versus move to a cluster gets made.
Annually: compliance review. Has the regulatory landscape changed? Has your use case expanded to cover new data types? Does the current configuration still satisfy the compliance framework you started from?
For a small shop, the whole thing amounts to maybe two hours of work per month. For a business without IT staff, that is the kind of ongoing work a managed-services relationship handles quietly in the background. Petronella's managed IT services wrap Jan deployments alongside the rest of the endpoint-management stack so it is not a separate process.
Security Considerations That Are Actually Worth Thinking About
A local LLM does not automatically make your data secure. It removes one large class of risk (third-party cloud exposure) and it adds two smaller ones that need attention.
The model weights are code. A compromised model (weights modified by an attacker before you download them) can produce prompt-specific misbehavior or data exfiltration through downstream tools. Always download from the primary source (HuggingFace official repos, not randomly uploaded mirrors). Note the SHA256 of the GGUF file and save it. This is a very low-probability risk but it is worth doing once.
The laptop running Jan is the new perimeter. If the laptop is compromised, so is everything the user pasted into Jan. Full-disk encryption, EDR, a sane patching schedule, and multi-factor auth on local login are not optional for a machine running a local LLM on sensitive data. This is standard cybersecurity hygiene, and Petronella handles it for managed clients as part of the baseline.
Model capability misuse. A local LLM will happily write phishing emails or harmful code if asked, because you are not subject to the safety filters of a hosted service. For regulated business use this is usually a non-issue (staff are not going to ask the office laptop to draft malware), but it is worth documenting that the responsibility for content policy shifts from the vendor to you.
Prompt injection through pasted content. If a user pastes an attacker-controlled document into Jan and asks for a summary, the document can contain instructions that the model will try to follow. For most internal summarization work this is not a big deal, but for any workflow where Jan is chained to tools (MCP, agentic flows, API calls to other systems), prompt injection is a real attack surface and deserves a threat model.
None of this is a reason to avoid local LLMs. It is a reason to treat the laptop that runs them the same way you treat a laptop that handles any other sensitive data. Which you already should be.
Where Petronella Fits In
Petronella Technology Group has been doing compliance-sensitive IT for Raleigh-area businesses since 2002, BBB A+ since 2003, and we have the credentials that matter for this kind of work: the team is CMMC-RP certified, we are a registered CMMC-AB RPO #1449, Craig Petronella holds DFE #604180, CCNA, and CWNE, and the firm's bench of professionals brings a decade-plus of hands-on work in AI infrastructure, digital forensics, HIPAA/CMMC/GLBA compliance, and managed IT.
We also run more than ten production AI agents of our own, built on a mix of Claude and open-source models, powering inbound voice reception, calendar booking, sales follow-up, content generation, and a CMMC compliance assistant. The same engineering muscle that built those agents is what evaluates tools like Jan AI for clients. We run the software, we break the software, and we decide whether it holds up under real regulated workloads before we put a single client on it.
The reason we end up writing posts like this one is that the same three questions come up over and over on first calls:
- Can we use AI without breaking our compliance posture? (Yes, if you do it carefully.)
- Is Jan on a laptop good enough or do we need a real server? (Depends on your user count, model size, and workload pattern. This post covered the trade-offs.)
- Who is actually going to keep this running once you leave? (Either your internal IT staff, with documentation and a runbook, or us, on a managed-services basis.)
We deploy both ends of this spectrum. A pilot on a handful of staff laptops is a one-day engagement: scoping call, licensing review, install on each device, staff training, documentation for your asset inventory. A private AI cluster deployment is a two-to-six-week project depending on the hardware configuration, the compliance framework in scope, and whether you are integrating with existing systems. Most mid-sized regulated clients end up with a mix: laptops for everyone (easy, private, cheap), plus one shared GPU server for the heavier workloads that the laptops cannot hit.
Our full AI services overview covers the whole arc from initial AI readiness assessment through private cluster deployment to ongoing managed operations. If you are on the "is Jan right for us" side of the question, we have a one-hour assessment that looks at your use cases, your compliance obligations, your existing hardware, and your user count, and gives you a written recommendation. If the answer is "Jan is fine, here is the deployment plan," we hand you the plan and you run it. If the answer is "you will outgrow Jan in six months, here is the cluster spec," we scope the cluster.
No hard-sell, no AI mysticism, no consulting time wasted on the stuff you could Google.
The Honest Summary
Jan AI is a well-built open-source local LLM runner with an Apache 2.0 license, a polished desktop UI, an OpenAI-compatible API, MCP support, and an active development pace. At v0.7.9 it is mature enough for production use in small-to-medium regulated deployments and has passed our internal review for CMMC Level 1, HIPAA technical safeguards, and attorney-client privilege preservation when deployed with the configuration guidance above.
It is the right answer for:
- One to ten users in a regulated business
- Laptops with 16 GB of RAM or more running 7B-to-13B models
- Workloads that are mostly interactive (chat, drafting, summarization, light code)
- Pilots and proofs of concept before committing to a private cluster
It is not the right answer for:
- More than about ten concurrent users on a single workload
- Models above 70B parameters
- Enterprise-scale batch inference
- CUI handling at volume (use a private cluster or GCC High instead)
- Any deployment that skips the hardening checklist earlier in this post
If any of this sounds like your situation, you already know what to do: call Petronella at (919) 348-4912, or reach us through /contact-us/. Tell us what you are trying to do, what framework governs the data, and how many people need the tool. We will tell you, honestly, whether Jan is the right move or whether you need to start thinking about a private cluster. If the honest answer is "install Jan yourself, follow this guide, you do not need us," we will tell you that too.
Sources
- Jan official site: https://jan.ai/
- Jan GitHub repository: https://github.com/janhq/jan
- Jan v0.7.9 release notes: https://github.com/janhq/jan/releases/latest
- Jan Apache 2.0 license: https://github.com/janhq/jan/blob/dev/LICENSE
- Jan Microsoft Store: https://apps.microsoft.com/detail/xpdcnfn5cpzlqb
- Jan Flathub build: https://flathub.org/apps/ai.jan.Jan
- Meta Llama 3.3 license: https://www.llama.com/llama3_3/license/
- Qwen models: https://huggingface.co/Qwen
- DeepSeek AI: https://github.com/deepseek-ai
- Gemma license terms: https://ai.google.dev/gemma/terms
- llama.cpp (Jan's inference backend): https://github.com/ggerganov/llama.cpp
- CMMC-AB RPO #1449 Petronella listing: https://cyberab.org/Member/RPO-1449-Petronella-Cybersecurity-And-Digital-Forensics