Nova postagem

検索

Artigo
· Abr. 1, 2024 2min de leitura

Overview of Generative AI - Part1


Generative artificial intelligence is artificial intelligence capable of generating text, images or other data using generative models, often in response to prompts. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics.

 

Generative AI is artificial intelligence capable of generating text, images and other types of content. What makes it a fantastic technology is that it democratizes AI, anyone can use it with as little as a text prompt, a sentence written in a natural language.

 

how large language models work

 

  • Tokenizer, text to numbers: Large Language Models receive a text as input and generate a text as output. However, being statistical models, they work much better with numbers than text sequences. That’s why every input to the model is processed by a tokenizer, before being used by the core model. A token is a chunk of text – consisting of a variable number of characters, so the tokenizer's main task is splitting the input into an array of tokens. Then, each token is mapped with a token index, which is the integer encoding of the original text chunk. Example of tokenization
  • Predicting output tokens: Given n tokens as input (with max n varying from one model to another), the model is able to predict one token as output. This token is then incorporated into the input of the next iteration, in an expanding window pattern, enabling a better user experience of getting one (or multiple) sentence as an answer. This explains why, if you ever played with ChatGPT, you might have noticed that sometimes it looks like it stops in the middle of a sentence.
  • Selection process, probability distribution: The output token is chosen by the model according to its probability of occurring after the current text sequence. This is because the model predicts a probability distribution over all possible ‘next tokens’, calculated based on its training. However, not always the token with the highest probability is chosen from the resulting distribution. A degree of randomness is added to this choice, in a way that the model acts in a non-deterministic fashion - we do not get the exact same output for the same input. This degree of randomness is added to simulate the process of creative thinking and it can be tuned using a model parameter called temperature.


In the upcoming article, we will engage in practical demonstrations.

Thanks

1 Comment
Discussão (1)1
Entre ou crie uma conta para continuar
Artigo
· Mar. 29, 2024 2min de leitura

.NET Client-Side Development on IRIS Data Platform

InterSystems IRIS provides a complete application development environment for building sophisticated data- and analytics-intensive applications that connect data and application silos. It is designed to work with all of the common development technologies in an open, standards-based fashion and supports both server-side and client-side programming.

InterSystems IRIS supports server-side application development with both Python and InterSystems ObjectScript. InterSystems IRIS also supports client-side development using many popular development technologies, including Java, C#/.NET, Node.js, Python, and ObjectScript.

The purpose of this article will be to focus on client-side development using a popular environment, the .NET development environment.

The ADO.NET Managed Provider, NET Native SDK, XEP API, and the Entity Framework Provider are a set of powerful APIs that combine to cover your bases regarding client-side InterSystems IRIS data platform development by leveraging the .NET framework.

ADO.NET Managed Provider

The ADO.NET Managed Provider is the InterSystems implementation of the ADO.NET data access interface, which will enable connection to IRIS from your .NET application, enabling the use of SQL queries to access data. The other three APIs use this underlying connection protocol.

.NET Native SDK

The .NET Native SDK will provide direct access to InterSystems IRIS objects, globals, and ObjectScript functionality, such as running classes and routines. Directly accessing globals, the fundamental storage structure for data in IRIS, can speed up data retrieval for your .NET application.

XEP API

The XEP API will facilitate high-speed access to InterSystems objects. This is most useful when working with high throughput objects with low to medium complexity.

The Entity Framework Provider and Object Relational Mapping (ORM)

The Entity Framework Provider is the InterSystems implementation of the Entity Framework, the object-relational mapping for ADO.NET.

  • What is Object-Relational Mapping or ORM?
    • A technique that lets you query and manipulate data from a database using an object-oriented paradigm. These techniques are often implemented as libraries, such as the SQLAlchemy library for Python.
    • If you're drawing a blank thinking of the equivalent library that implements ORM in IRIS, you should be. IRIS can be treated as a relational database (you can use SQL queries to query data stored in IRIS), so there is no need for a library implementing ORM techniques when ORM is built into the platform itself.

.NET developers can leverage any of these APIs alone or in conjunction with the stipulation of requiring the InterSystems.Data.IRISClient.dll assembly file to be referenced in their .NET project. Each API has its pros and cons, but a measured use of each one's capabilities provides a balanced approach to developing on the InterSystems IRIS data platform with the .NET Framework.

1 Comment
Discussão (1)1
Entre ou crie uma conta para continuar
Artigo
· Mar. 28, 2024 1min de leitura

How to register and reference task schedules programmatically

InterSystems FAQ rubric

Here, we will introduce a sample code for registering and referencing task schedules.

 ①Sample of task schedule registration

*Create a task to execute do ^TEST every day at 1:00 am. 

 set task=##class(%SYS.Task).%New()
 set task.Name="MyTask1"
 set taskDescription="Execute ^xxx every day at 1:00 AM" // Optional
 set task.NameSpace="USER"
 set task.TimePeriod=0
 set task.DailyFrequency=0
 set task.DailyFrequencyTime=""
 set task.DailyIncrement=""
 set task.DailyStartTime=$ZTimeh("01:00:00")
 set task.DailyEndTime=""
 set task.TaskClass="%SYS.Task.RunLegacyTask"
 set task.Settings=$LB("ExecuteCode","do ^TEST") // Set ExecuteCode for RunLegacyTask
 write task.%Save()

② Sample of task schedule reference

*The contents registered in the task schedule are obtained programmatically.

USER>set task=##class(%SYS.Task).%OpenId(1) USER>zwrite tasktask=<object reference="">[14@%SYS.Task]

+----------------- attribute values ------------------
| %Concurrency = 1
| DailyEndTime = 0
| DailyFrequency = 0
| DailyFrequencyTime = ""
| DailyIncrement = ""
| DailyStartTime = 0
| DayNextScheduled = 63877
| DeleteAfterRun = 0
| Description = "Journal files are switched at midnight every day."

// If you want to refer to individual items, do the following:
USER>write $ZDT(task.DayNextScheduled)
11/21/2015
USER>write task.Name
Journal switching
USER>

2 Comments
Discussão (2)2
Entre ou crie uma conta para continuar
Artigo
· Mar. 28, 2024 3min de leitura

InterSystems通过向量搜索扩展了InterSystems IRIS数据平台,支持下一代人工智能应用

2024年3月26日,InterSystems数据平台全球主管Scott Gnau发文,宣布InterSystems IRIS数据平台新增了向量搜索(vector search)功能。

本文作者为Scott Gnau,InterSystems数据平台全球主管。

人工智能具备变革性潜力,能够从数据中获取价值和洞察力。我们正在迈向一个几乎所有应用都将通过人工智能来驱动的世界,随之而来的,是构建这些应用的开发人员需要正确的工具从这些应用中创造体验。因此,InterSystems非常高兴地宣布这一消息——IRIS数据平台新增了向量搜索(vector search)功能。

在使用大型语言模型时,像向量搜索这样的工具对于从海量数据集中高效、准确地检索相关信息至关重要。通过将文本和图像转换为高维向量,这些技术可以支持快速比较和搜索,即便处理分散在整个组织、不同数据集的数百万个文件时也是如此。

InterSystems IRIS数据平台为下一代应用提供了统一基础

在InterSystems,我们始终在探寻各种方式,使下一代数据处理尽可能地离客户数据近一些,而无需将数据传输到特定系统。将向量搜索功能添加至InterSystems IRIS数据平台后,我们可以通过向量嵌入(vector embedding)对数据平台进行搜索,从而增强软件在自然语言处理(NLP)、文本和图像分析相关任务中的功能。这种集成将使开发人员能够更轻松地创建使用生成式人工智能的应用程序,以完成各种用例的复杂任务,并根据InterSystems处理的专有数据(proprietary data)提供即时响应。这也意味着他们可以使用精巧的向量化索引来完成这项工作,同时对保持内部专有产权情报的安全充满信心。

这一功能支持InterSystems IRIS数据平台管理和查询内容及相关的密集向量嵌入,特别是能够与RAG集成,开发基于生成式人工智能的应用。随着可用工具集的快速发展,无缝RAG集成可支持新模型和用例的敏捷采用。

这项技术能够给客户带来哪些益处?

BioStrand是一家依赖于人工智能的药物发现公司,也是InterSystems创新计划(InterSystems Innovation Program)的一部分(该计划帮助初创企业在我们的IRIS平台上构建应用)。BioStrand的核心产品是Lensai平台,这是一种多功能解决方案,支持包括抗体药物发现和设计在内的各种应用。通过先进的算法,Lensai可以迅速识别并设计新型药物化合物,大大缩短了从开发到商业化的研发时间。该模型将采用先进堆叠技术的大型语言模型(LLM)的优势与BioStrand的专利技术HYFT独特地结合在一起。

HYFT是一种嵌入类型,在生物序列中充当独一无二的“指纹”,使BioStrand能够高精度地分配来自不同LLM的嵌入。这个基础模型代表着一个庞大且不断扩展的知识图谱,在6.6亿个数据对象中映射了250亿种关系,令人印象深刻。这个全面的图谱将整个生物圈的序列、结构、功能以及书目信息相互连接在一起。它还融合了检索增强生成、SQL向量搜索等尖端技术,以及LLM的生成能力和知识图谱的语义表达能力。

向量搜索将从根本上改变开发人员与IRIS的交互方式

在实施这项技术方面,我们还只是刚刚起步。随着客户与数据的交互方式因向量搜索而得到改变,随着新的人工智能应用不断通过应用向量搜索而得到开发,我们将分享更多客户故事。与此同时,我也推荐您访问我们的向量搜索页面,了解更多信息。

我们加速创新,确保客户成功,并展示对卓越的承诺,与此同时,我们致力于维护最高标准的隐私、安全和责任,这将引导我们以一种深思熟虑、公正的方式对待人工智能,从而创造信任。我们相信,透明度、责任感和可解释性是建立对人工智能系统的信任并推动其创新的关键。

Discussão (0)1
Entre ou crie uma conta para continuar
Artigo
· Mar. 27, 2024 2min de leitura

A Better data import experience for LOAD DATA

In recent versions of IRIS, a powerful new data loading command has been introduced to SQL: LOAD DATA. This feature has been highly optimized to import data into IRIS extremely fast, allowing hundreds of gigabytes of data to be inserted in seconds instead of hours or days. 

This is a very exciting improvement. However, a big problem in the data loading experience still exists. Namely, the time and hassle it takes to:

  1. Define the schema for the table in which you want to load data.
  2. Figure out the syntax for the LOAD DATA command.

I've developed a user interface that invisibly handles the CREATE TABLE step and then generates the syntax for LOAD DATA, all in a handy wizard!

At least in my case -- although I've been in the database business for decades -- I only import data a few times a year. Therefore, my CREATE TABLE skills get rusty, and it's really nice to have a tool take care of that for me. And this tool doesn't just handle syntax. It also inspects the input CSV file using a utility from the SQL Utilities library from @Benjamin De Boe to figure out the data types and their length. Then it asks a few more questions to define the syntax of the required LOAD DATA command. The app can run it, or you can just copy the syntax and tweak it yourself. 

Here's a walkthrough.

Step 1: Install the app and review the CSV file

After following the instructions to install the solution, you will have an Angular app published as a CSP application and a backend ObjectScript application that serves as the API to interface with the database.

Take a look at my sample data set (using the Rainbow CSV extension in VS Code). It has a mix of numeric, text and empty columns.

Step 2: Go to the app

You will probably find the app at http://localhost:52773/csp/dataloadapp/index.html if you use the default IRIS port and web application name.

Step 3: Specify the CSV file location

Step 4: Specify the CSV file's format

LOAD DATA needs to know some things like the column delimiter character and where to start in the file. 

Step 5: Define a destination table name, with the schema name as well

Step 6: Fine tune the field names and data types

Most of this will be filled in for you, and should be pretty accurate, but you will probably want to adjust some names or field lengths.

And that's it! Press "Load CSV" and the client-side app will make a call to the server to run a CREATE TABLE SQL command, then run LOAD DATA with the syntax shown in the black box on the right. Going into the Management Portal (or any other SQL client), you can see I now have the CSV file loaded into IRIS.

 

I must apologize in advance that there isn't much error checking yet, but this is open source so if you find this tool useful, join me in improving it on GitHub.

6 Comments
Discussão (6)3
Entre ou crie uma conta para continuar