Nova postagem

Pesquisar

Pergunta
· Mar. 25

How to tokenize a text using SentenceTransformer?

Hi all.

I'm trying to create an indexed table with an vector field so I can search by the vector value.
I've been investigating and found that to get the vector value based on the text (token), use a Python method like the following:

ClassMethod TokenizeData(desc As %String) As %String [ Language = python ]
{
    import iris
    # Step 2: Generate Document Embeddings
    from sentence_transformers import SentenceTransformer

    model = SentenceTransformer('/opt/irisbuild/all-MiniLM-L6-v2')

    # Generate embeddings for each document
    document_embeddings = model.encode(desc)

    return document_embeddings.tolist()
}

The model all-MiniLM-L6-v2 is downloaded from https://ollama.com/library/all-minilm and installed into my Docker instance.

When I've tryed to test this métod (from Visual Studio), it throws the following error:

 

<THROW>DebugStub+40^%Debugger.System.1 *%Exception.PythonException <PYTHON EXCEPTION> 246 <class 'OSError'>: It looks like the config file at '/opt/irisbuild/all-MiniLM-L6-v2/config.json' is not a valid JSON file.  

Then I changed the config.json file to create a valid JSon file (I only wrote the curly braces) and repeated the test, but there is a new error.

 

<THROW>DebugStub+40^%Debugger.System.1 *%Exception.PythonException <PYTHON EXCEPTION> 246 <class 'safetensors_rust.SafetensorError'>: Error while deserializing header: HeaderTooSmall  

Does anyone know how to fix this problem?
Is there any other way to create the vector value so I can index it?

Best regards.

2 Comments
Discussão (2)2
Entre ou crie uma conta para continuar
Pergunta
· Mar. 25

Walking a virtual document's structure

Is there a generic process for "walking" the structure of a virtual document - eg an HL7 message (EnsLib.HL7.Message) or an XML document (EnsLib.EDI.XML.Document).

At least we'd want to be able to visit all "nodes" (HL7 fields or sub-fields, XML nodes) in the virtual document and be able to work out/generate the Property Path (so we could call "GetValueAt"). 

We can just about come up with something generic for HL7, since it only nests down to 4 levels within each segment, though we're using numeric Property Path's at that point rather than symbolic ones (MSH:1.3 etc).

Reading the documentation has not so far cast light! Any pointers welcome. Thanks.

7 Comments
Discussão (7)3
Entre ou crie uma conta para continuar
Discussão (1)1
Entre ou crie uma conta para continuar
Pergunta
· Mar. 25

Distinct strings in MDX

Hi!

I have question about MDX functionality in context of IRIS Analytics.

How does IRIS MDX distinct selection works? Is there any restruqtion when analyzing strings? Like special symbols or length?

Here is an example

I have this data:

 6 rows and 2 of them unique

Then we create data cube based on this model and examine it with Analyzer

 
Detailed listing
 

6 rows an ONE unique string. Which is obviously not true. 
This happed only with string with symbols in it

My task is to get the right amount of unique strings

2 Comments
Discussão (2)2
Entre ou crie uma conta para continuar
Artigo
· Mar. 25 1min de leitura

ORDER BY で並べ替える際の照合順を調整する

これは、InterSystems FAQサイトの記事です。
 

SQLでクエリ実行時、ORDER BYで並べ替えをする場合、各RDBMSによって、照合順が異なることがあります。

たとえば、NULLと空文字の混じった文字列のカラムを並べ替える場合、

IRIS SQLでは、既定の照合順は下記のようになりますが、 

NULL, 空文字, 0, 00, 01, 1, 10, 100, 11, A, a, B, b

Oracleでは、下記のような照合順になります。

空文字, 0, 00, 01, 1, 10, 100, 11, A, B, a, b, NULL


複数のDB由来のデータを取り扱う際には、このような照合順の違いを合わせたい場合があります。

IRISの場合、既定の文字列照合はSQLUPPERですが、照合タイプを変更することによって、照合順を変えることが出来ます。

ドキュメント:照合

上記のIRIS SQLとOracleの違いを合わせるためには、照合タイプを「SQLSTRING」に設定し、インデックスを作成します。

Property TestColumn As %String(COLLATION = "SQLSTRING"); 
Index IdxTest On TestColumn;


但し、Nullについてはこの方法では対応できないため、Nullの照合順を合わせたい場合には、ORDER BY句を下記のようにします。

ORDER BY IFNULL(TestColumn, 1, 0), TestColumn
Discussão (0)1
Entre ou crie uma conta para continuar