Nova postagem

検索

Artigo
· Mar. 21, 2024 2min de leitura

VECTOR inside IRIS

This is an attempt to run a vector search demo completely in IRIS
There are no external tools and all you need is a Terminal / Console and the management portal.
Special thanks to Alvin Ryanputra as his package iris-vector-search that was the base
of inspiration and the source for test data.
My package is based on IRIS 2024.1 release and requires attention to your processor capabilities.

I attempted to write the demo in pure ObjectScript.
Only the calculation of the description_vectoris done in embedded Python

Calculation of a vector with 384 dimensions over 2247 records takes time.
In my Docker container, it was running 01:53:14 to generate them completely.

You have been warned!
So I adjusted this step to be reentrant to allow pausing vector calculation.
Every 50 records you get an offer to have a stop. 

The demo looks like this:

USER>do ^A.DemoV

      Test Vector Search
=============================
     1 - Initialize Tables
     2 - Generate Data
     3 - VECTOR_COSINE
     4 - VECTOR_DOT_PRODUCT
     5 - Create Scotch
     6 - Load Scotch.csv
     7 - generate VECTORs
     8 - VECTOR search
Select Function or * to exit : 8

      Default search:
Let's look for TOP 3 scotch that costs less than $100,
 and has an earthy and creamy taste
     change price limit [100]: 50
     change phrase [earthy and creamy taste]: earthy 

 calculating search vector
  
     Total below $50: 222 

ID      price   name
1990    40      Wemyss Vintage Malts 'The Peat Chimney,' 8 year old, 40%
1785    39      The Famous Jubilee, 40%
1868    40      Tomatin, 15 year old, 43%
2038    45      Glen Grant, 10 year old, 43%
1733    29      Isle of Skye, 8 year old, 43% 5 Rows(s) Affected


- You see the basic functionalities of Vectors in steps 1..4
- Steps 5..8 are related to the search example I borrowed from Alvin
- Step 6 (import of test data) is straight ObjectScript
  SQL LOAD DATA was far too sensible for irregularities in the input CSV

I suggest following the examples also in MGMT portal to watch how Vectors operate.

GitHub
 

11 Comments
Discussão (11)5
Entre ou crie uma conta para continuar
Artigo
· Mar. 18, 2024 5min de leitura

コンソールログに"There exists a MISMATCH.WIJ file" が記録され、インスタンスの開始ができない時の対処法

これは InterSystems FAQ サイトの記事です。
 

インスタンスの開始が失敗し、コンソールログに"There exists a MISMATCH.WIJ file"が記録されている場合、何らかのシステム障害の影響でデータベースの整合性に関して問題が生じていることを示しています。

このような状況が発生した際にインスタンスの開始ができるようにするためには、以下の手順を実施します。

(1) a. インスタンスをNOSTUモードで起動
(2) b. データベースの整合性チェック

◆(2)の整合性チェックでエラーを検出しなかった場合、
 (3) d. MISMATCH.WIJ ファイルのリネーム
 (4) e. インスタンスの再起動
を実施します。

◆(2)の整合性チェックでエラーが検出された場合は、
 (3) c. MISMATCH.WIJファイルの適用
 (4) b. データベースの整合性チェック
 (5) d. MISMATCH.WIJファイルのリネーム
 (6) e. インスタンスの再起動
を実施します。 

以下に各手順の詳細を説明します。 

a. インスタンスをNOSTUモードで起動します。

NOSTUで開始する(Caché)

NOSTUで開始する(IRIS)

b. データベースの整合性チェック

STURECOV ルーチンを実行してデータベースの整合性チェックを行います。

%SYS>do ^STURECOV
Logins are not disabled.
This routine is designed to run when Cache' is in single user mode due to a problem running the STU startup routine.
Do you want to continue ? No => yes
Warning: Misuse of this utility can harm your system
There is no record of any errors during the prior startup
This could be because there was a problem writing the data to disk or because the system failed to start for some otherreason.
Do you want to continue ? No => yes
Enter error type (? for list) [^] => MISMATCHWIJ

1) List Affected Databases and View Blocks
2) Apply mismatched blocks from WIJ to databases 
3) Rename MISMATCH.WIJ         
4) Dismount a database
5) Mount a database
6) Database Repair Utility
7) Check Database Integrity
8) Bring down the system prior to a normal startup
--------------------------------------------------------------
H) Display Help
E) Exit this utility
--------------------------------------------------------------
Enter choice (1-8) or [Q]uit/[H]elp? 7
This utility is used to check the integrity of a database and the pointer structure of one or more globals.
Output results on
Device: Integrity_STURECOV.log        整合性チェックの結果をログファイルに出力します
Parameters? "WNS" =>
Stop after any error?  No=>
Do you want to check all databases?  No=> Yes
Checking c:\intersystems\irishealth\mgr\ at 23:09:20
Checking c:\intersystems\irishealth\mgr\enslib\ at 23:09:21
Checking c:\intersystems\irishealth\mgr\hscustom\ at 23:09:22
Checking c:\intersystems\irishealth\mgr\hslib\ at 23:09:23
Checking c:\intersystems\irishealth\mgr\hssys\ at 23:09:31
Checking c:\intersystems\irishealth\mgr\irisaudit\ at 23:09:31
Checking c:\intersystems\irishealth\mgr\irislib\ at 23:09:31
Checking c:\intersystems\irishealth\mgr\irislocaldata\ at 23:09:33
Checking c:\intersystems\irishealth\mgr\iristemp\ at 23:09:33
Checking c:\intersystems\irishealth\mgr\test\ at 23:09:33
Checking c:\intersystems\irishealth\mgr\user\ at 23:09:34
--------------------------------------------------------------
1) List Affected Databases and View Blocks
2) Apply mismatched blocks from WIJ to databases
3) Rename MISMATCH.WIJ
4) Dismount a database
5) Mount a database
6) Database Repair Utility
7) Check Database Integrity
8) Bring down the system prior to a normal startup
--------------------------------------------------------------
H) Display Help
E) Exit this utility
--------------------------------------------------------------
Enter choice (1-8) or [Q]uit/[H]elp?

 

出力されたログファイルの末尾に "No Errors were found." と記録されていれば整合性チェックでエラーは検出していません。

 

c. MISMATCH.WIJファイルの適用

STURECOV ルーチンよりMISMATCH.WIJファイルを適用します。

--------------------------------------------------------------
1) List Affected Databases and View Blocks
2) Apply mismatched blocks from WIJ to databases 
3) Rename MISMATCH.WIJ         
4) Dismount a database
5) Mount a database
6) Database Repair Utility
7) Check Database Integrity
8) Bring down the system prior to a normal startup
--------------------------------------------------------------
H) Display Help
E) Exit this utility
--------------------------------------------------------------
Enter choice (1-8) or [Q]uit/[H]elp? 2

d. MISMATCH.WIJ ファイルのリネーム
 

STURECOV ルーチンよりMISMATCH.WIJファイルをリネームします。

 

--------------------------------------------------------------
1) List Affected Databases and View Blocks
2) Apply mismatched blocks from WIJ to databases 
3) Rename MISMATCH.WIJ         
4) Dismount a database
5) Mount a database
6) Database Repair Utility
7) Check Database Integrity
8) Bring down the system prior to a normal startup
--------------------------------------------------------------
H) Display Help
E) Exit this utility
--------------------------------------------------------------
Enter choice (1-8) or [Q]uit/[H]elp? 3

e. インスタンスの再起動

STURECOV ルーチンよりインスタンスを停止後、インスタンスをランチャー等より通常起動します。

--------------------------------------------------------------
1) List Affected Databases and View Blocks
2) Apply mismatched blocks from WIJ to databases
3) Rename MISMATCH.WIJ
4) Dismount a database
5) Mount a database
6) Database Repair Utility
7) Check Database Integrity
8) Bring down the system prior to a normal startup
--------------------------------------------------------------
H) Display Help
E) Exit this utility
--------------------------------------------------------------
Enter choice (1-8) or [Q]uit/[H]elp? 8

MISMATCH.WIJの適用処理が正常終了しない、またはインスタンスの再開始後の整合性チェックでエラーがある場合は、カスタマーサポートセンターまでお問い合わせください 

WIJ ブロック比較
 

Discussão (0)1
Entre ou crie uma conta para continuar
Artigo
· Mar. 18, 2024 2min de leitura

Getting data from InterSystems IRIS CloudSQL using xDBC

Recently, the question came up while discussing the access to the data stored in IRIS from different languages with my students if it was possible to initiate the connection and get data from Cloud solution (InterSystems IRIS CloudSQL) from Microsoft Excel, not the other way around. Considering the many varied ways one can get data in Excel (import data from external sources, connecting to databases using ODBC drivers, using power queries and web queries etc.) the obvious choice was to try ODBC driver. The only task left was to try to connect to the database in the cloud using the ODBC driver.

Quite unsurprisingly, it worked!

If you're new to CloudSQL, I would highly suggest you read the documentation. From it you will be able to get access to the InterSystems Cloud Services Portal (or you can go directly to AWS and subscribe with your AWS account) and download necessary drivers. BTW, if you encounter problems with CloudSQL, you may try to look for answers in this document first.

Now that these preliminaries are out of the way, let's try the simplest approach - setting up the ODBC data source with just a password and getting data from it in MS Excel.

Step 0. Set up your InterSystems IRIS CloudSQL. You will see the necessary settings on the Overview page of your Deployment:

Step 00. Download and install ODBC driver for your OS.

Step 1. Open your ODBC Data Sources and switch to System DSN tab. In it, click on Add...

and fill in the settings from your Overview page (first screenshot of the article).

Step 2. Connect Excel to IRIS. Open Excel, navigate to the Data tab, and select "Get Data" or "From Other Sources," depending on your Excel version. Choose "From ODBC" as the data source

select the ODBC data source you configured earlier

enter authentication credentials if asked

and choose the table you wish to load into Excel

Step 3. You're good to go and do whatever with your data

PS. Some other interesting articles/videos that I would suggest you read/watch regarding ODBC and CloudSQL:

  • a whole series of videos that show how to connect to CloudSQL from solutions written in different languages
  • an interesting article on how to set up roles and give permissions to xDBC connections
  • another article on how to use SSL/TLS to establish secure connections over JDBC and other driver technologies
  • and the last one about switching between ODBC drivers in an environment with multiple versions installed
4 Comments
Discussão (4)2
Entre ou crie uma conta para continuar
Anúncio
· Mar. 18, 2024

Developer Meetup in Boston, March 27

Hi Developers,
Let's meet in-person at our next Developer Meetup in Boston on March 27 5:30 pm!
>> RSVP here <<

 

Tech Talks:
"Update on GenAI Use Cases in Healthcare. Back office automation and chart review using GenAI” by Qi Li, Physician Executive, Product Innovation, InterSystems
"IntegratedML: Plugging ML into SQL" by Akshat Vora, Senior Systems Developer, InterSystems

⏱ Day and Time: March 27, 5:30 p.m. to 7:30 p.m.
📍CIC Venture Café in Cambridge, Massachusetts

Looking forward to seeing you there!

Discussão (0)1
Entre ou crie uma conta para continuar
Artigo
· Mar. 18, 2024 10min de leitura

Pandas for KPIs in InterSystems IRIS BI

Pandas is not just a popular software library. It is a cornerstone in the Python data analysis landscape. Renowned for its simplicity and power, it offers a variety of data structures and functions that are instrumental in transforming the complexity of data preparation and analysis into a more manageable form. It is particularly relevant in such specialized environments as ObjectScript for Key Performance Indicators (KPIs) and reporting, especially within the framework of the InterSystems IRIS platform, a leading data management and analysis solution.
 
In the realm of data handling and analysis, Pandas stands out for several reasons. In this article, we aim to explore those aspects in depth:
 

  • Key Benefits of Pandas for Data Analysis:

In this part, we will delve into the various advantages of using Pandas. It includes intuitive syntax, efficient handling of large datasets, and the ability to work seamlessly with different data formats. The ease with which Pandas integrates into existing data analysis workflows is also a significant factor that enhances productivity and efficiency.

  • Solutions to Typical Data Analysis Tasks with Pandas:

Pandas is versatile enough to tackle routine data analysis tasks, ranging from simple data aggregation to complex transformations. We will explore how Pandas can be used to solve these typical challenges, demonstrating its capabilities in data cleaning, transformation, and exploratory data analysis. This section will provide practical insights into how Pandas simplifies these tasks.

  • Using Pandas Directly in ObjectScript KPIs in IRIS:

The integration of Pandas with ObjectScript for the development of KPIs in the IRIS platform is simply a game-changer. This part will cover how Pandas can be utilized directly within ObjectScript, enhancing the KPI development process. We will also explore practical examples of how Pandas can be employed to analyze and visualize data, thereby contributing to more robust and insightful KPIs.

  •  Recommendations for Implementing Pandas in IRIS Analytic Processes:

Implementing a new tool in an existing analytics process can be challenging. For that reason, this section aims to provide best practices and recommendations for integrating Pandas into the IRIS analytics ecosystem as smoothly as possible. From setup and configuration to optimization and best practices, we will cover essential guidelines to ensure a successful integration of Pandas into your data analysis workflow.
 
Pandas is a powerful data analytics library in the Python programming language. Below, you can find a few benefits of Pandas for data analytics:
 

  1. Ease of use: Pandas provides a simple and intuitive interface for working with data. It is built on top of the NumPy library and provides such high-level data structures as DataFrames, which makes it easy to work with tabular data.  
  2. Data Structures: The principal data structures in Pandas are Series and DataFrame. Series is a one-dimensional array with labels, whereas DataFrame is a two-dimensional table representing a set of Series. These data structures combined allow convenient storage and manipulation of data.  
  3. Handling missing data: Pandas provides convenient methods for detecting and handling missing data (NaN or None). It includes some methods for deleting, filling, or replacing missing values, simplifying your work with real data.  
  4. Data grouping and aggregation: With Pandas it is easy to group data by features and apply aggregation functions (sum, mean, median, etc.) to each data group.  
  5. Powerful indexing capabilities: Pandas provides flexible tools for indexing data. You can use labels, numeric indexes, or multiple levels of indexing. It allows you to filter, select, and manipulate data efficiently.  
  6. Reading and writing data: Pandas supports multiple data formats, including CSV, Excel, SQL, JSON, HTML, etc. It facilitates the process of reading and writing data from/to various sources.  
  7. Extensive visualization capabilities: Pandas is integrated with such visualization libraries as Matplotlib and Seaborn, making it simple to create graphs and visualize data, especially with the help of DeepSeeWeb through integration via embedded Python.  
  8. Efficient time management: Pandas provides multiple features for working with time series, including powerful tools for working with timestamps and periods.  
  9. Extensive data manipulation capabilities: The library provides various functions for filtering, sorting, and reshaping data, as well as joining and merging tables, which makes it a powerful tool for data manipulation.  
  10. Excellent performance: Pandas is purposefully optimized to handle large amounts of data. It provides high performance by using Cython and enhanced data structures.

 
Let's look at an example of Pandas' implementation in an ObjectScript environment. We will employ VSCode as our development environment. The choice of IDE in this case was determined by the availability of InterSystems ObjectScript Extension Pack, which provides a debugger and editor for ObjectScript.
First of all, let's create a KPI class:

  Class BI.KPI.pandasKpi Extends %DeepSee.KPI
{
}


 
Then, we should make an XML document defining the type, name, and number of columns and filters of our KPI:
 

XData KPI [ XMLNamespace = "http://www.intersystems.com/deepsee/kpi" ]
{
<!-- 'manual' KPI type will tell DeepSee that data will be gathered from the class method defined by us-->
 
<kpi name="MembersPandasDemo" sourceType="manual">
 
 <!-- we are going to need only one column for our KPI query -->
 
<property columnNo="1" name="Members" displayName="Community Members"/>
 
<!-- and lastly we should define a filter for our members -->
 
<filter name="InterSystemsMember"
displayName="InterSystems member"
sql="SELECT DISTINCT ISCMember from Community.Member"/>
 
 </kpi>
}


 
The next step is to define the python function, write the import, and create the necessary variables:
 

ClassMethod MembersDF(sqlstring) As %Library.DynamicArray [ Language = python ]
{
            # First of all, we import the most important library in our script: IRIS.
            # IRIS library provides syntax for calling ObjectScript classes.
            # It simplifies Python-ObjectScript integration.
            # With the help of the library we can call any class and class method, and
            # it returns whatever data type we like, and ObjectScript understands it.
            import iris
            # Then, of course, import the pandas itself.
            import pandas as pd
            # Create three empty arrays:
            Id_list = []
            time_list = []
            ics_member = []


 
Next step: define a query against the database:
 

# Define SQL query for fetching data.
# The query can be as simple as possible.
# All the work will be done by pandas:
            query = """
            SELECT
            id as ID, CAST(TO_CHAR(Created, 'YYYYMM') as Int) as MonthYear, ISCMember as ISC
            FROM Community.Member
            order by Created DESC
            """


Then, we need to save the resulting data into an array group:

 

# Call the class specified for executing SQL statements.
# We use embedded Python library to call the class:
sql_class = iris.sql.prepare(query)
 
# We use it again to call dedicated class methods:
rs = sql_class.execute()
 
# Then we use pandas directly on the result set to make dataFrame:
data = rs.dataframe()


            
We also can pass an argument to filter our data frame.

# Filter example
# We take an argument sqlstring which, in this case, contains boolean data.
# With a handy function .loc filtering all the data
if sqlstring is not False:
        data = data.loc[data["ISC"] == int(sqlstring)]


 
Now, we should group the data and define x-axis for it:
 
 

# Group data by date displayed like MonthYear:
grouped_data = data.groupby(["MonthYear"]).count()

 

Unfortunately, we cannot take the date column directly from grouped data DataFrame,
so, instead, we take the date column from the original DataFrame and process it.


 

# Filter out duplicate dates and append them to a list.
# After grouping by MonthYear, pandas automatically filters off duplicate dates.
# We should do the same to match our arrays:
sorted_filtered_dates = [item for item in set(data["MonthYear"])]
# Reverse the dates from left to right:
date = sorted(sorted_filtered_dates, reverse=True)
# Convert dict to a list:
id = grouped_data["ID"].id.tolist()
# Reverse values according to the date array:
id.reverse()
 
# In order to return the appropriate object to ObjectScript so that it understands it,
# we call '%Library.DynamicArray' (it is the closest one to python and an easy-to-use type of array).
# Again, we use IRIS library inside python code:
OBJIDList = iris.cls('%Library.DynamicArray')._New()
OBJtimeList = iris.cls('%Library.DynamicArray')._New()
 
# Append all data to DynamicArray class methods Push()
for i in date:
        OBJtimeList._Push(i)
for i in ID:
        OBJIDList._Push(i)
return OBJIDList, OBJtimeList
}


 
Next step is to define KPI specific method for DeepSee to understand what data to take:
 

// Define method. The method must always be %OnLoadKPI(). Otherwise, the system will not recognise it.
 
Method %OnLoadKPI() As %Status
{
 
 //Define string for the filter. Set the default to zero
 set sqlstring = 0
    //Call %filterValues method to fetch any filter data from the widget.
    if $IsObject(..%filterValues) {
        if (..%filterValues.InterSystemsMember'="")
        {
            set sqlstring=..%filterValues.%data("InterSystemsMember")
        }
    }
    //Call pandas function, pass filter value if any, and receive dynamic arrays with data.
    set sqlValue = ..MembersDF(sqlstring)
 
    //Assign each tuple to a variable.
    set idList = sqlValue.GetAt(1)
    set timeList = sqlValue.GetAt(2)
  
    //Calculate size of x-axis. It will be rows for our widget:
    set rowCount = timeList.%Size()
 
    //Since we need only one column, we assign variable to 1:
    set colCount = 1
    set ..%seriesCount=rowCount
  
    //Now, for each row, assign time value and ID value of our members:
    for rows = 1:1:..%seriesCount
    {
        set ..%seriesNames(rows)=timeList.%Get(rows-1)
       
        for col = 1:1:colCount
        {
            set ..%data(rows,"Members")=idList.%Get(rows-1)
        }
    }
    quit $$$OK


At this point, compile the KPI and create a widget on a dashboard using KPI data source.

 

That's it! We have successfully navigated through the process of integrating and utilizing Pandas in our ObjectScript applications on InterSystems IRIS. This journey has taken us from fetching and formatting data to filtering and displaying it, all within a single, streamlined function. This demonstration highlights the efficiency and power of Pandas in data analysis. Now, let's explore some practical recommendations for implementing Pandas within the IRIS environment and conclude with insights on its transformative impact.
Recommendations for Practical Application of Pandas in IRIS

  • Start with Prototyping:

Begin your journey with Pandas by using example datasets and utilities. This approach helps you understand the basics and nuances of Pandas in a controlled and familiar environment. Prototyping allows us to experiment with different Pandas functions and methods without the risks associated with live data.

  • Gradual Implementation:

Introduce Pandas incrementally into your existing data processes. Instead of a complete overhaul, identify the areas where Pandas can enhance or simplify data handling and analysis. It could be some simple tasks like data cleaning aggregation or a more complex analysis where Pandas capabilities can be fully leveraged.

  •  Optimize Pandas Use:

Prior to working with large datasets, it is crucial to optimize your Pandas code. Efficient code can significantly reduce processing time and resource consumption, which is especially important in large-scale data analysis. Such techniques such as vectorized operations, using appropriate data types, and avoiding loops in data manipulation can significanlty enhance performance.


Conclusion

The integration of Pandas into ObjectScript applications on the InterSystems IRIS platform marks a significant advancement in the field of data analysis. Pandas brings us an array of powerful tools for data processing, analysis, and visualization, which are now at the disposal of IRIS users. This integration not only accelerates and simplifies the development of KPIs and analytics but also paves the way for more sophisticated and advanced data analytical capabilities within the IRIS ecosystem.
 
With Pandas, analysts and developers can explore new horizons in data analytics, leveraging its extensive functionalities to gain deeper insights from their data. The ability to process and analyze large datasets efficiently, coupled with the ease of creating compelling visualizations, empowers users to make more informed decisions and uncover trends and patterns that were previously difficult to detect.
 
In summary, Pandas integration into the InterSystems IRIS environment is a transformative step, enhancing the capabilities of the platform and offering users an expanded toolkit for tackling the ever-growing challenges and complexities of modern data analysis.

4 Comments
Discussão (4)2
Entre ou crie uma conta para continuar