Since the introduction of Embedded Python there has always been doubt about its performance compared to ObjectScript and on more than one occasion I have discussed this with @Guillaume Rongier , well, taking advantage of the fact that I was making a small application to capture data from public competitions in Spain and to be able to perform searches using the capabilities of VectorSearch I saw the opportunity to carry out a small test.
Data for the test
Public tender information is provided monthly in XML files from this URL and the typical format of a tender information is as follows:
As you can see, each contest has considerable dimensions and in each file we can find about 450 contests. This dimension does not make it feasible to use an ObjectScript class for its mapping (it could be done... but I'm not in the mood).
Code for testing
My idea is to capture only the relevant fields for later searches, for this I have created the following class that will serve to store the captured information:
Class Inquisidor.Object.Licitacion Extends (%Persistent, %XML.Adaptor) [ DdlAllowed ]
{
Property IdLicitacion As %String(MAXLEN = 200);
Property Titulo As %String(MAXLEN = 2000);
Property URL As %String(MAXLEN = 1000);
Property Resumen As %String(MAXLEN = 2000);
Property TituloVectorizado As %Vector(DATATYPE = "DECIMAL", LEN = 384);
Property Contratante As %String(MAXLEN = 2000);
Property URLContratante As %String(MAXLEN = 2000);
Property ValorEstimado As %Numeric(STORAGEDEFAULT = "columnar");
Property ImporteTotal As %Numeric(STORAGEDEFAULT = "columnar");
Property ImporteTotalSinImpuestos As %Numeric(STORAGEDEFAULT = "columnar");
Property FechaAdjudicacion As %Date;
Property Estado As %String;
Property Ganador As %String(MAXLEN = 200);
Property ImporteGanador As %Numeric(STORAGEDEFAULT = "columnar");
Property ImporteGanadorSinImpuestos As %Numeric(STORAGEDEFAULT = "columnar");
Property Clasificacion As %String(MAXLEN = 10);
Property Localizacion As %String(MAXLEN = 200);
Index IndexContratante On Contratante;
Index IndexGanador On Ganador;
Index IndexClasificacion On Clasificacion;
Index IndexLocalizacion On Localizacion;
Index IndexIdLicitation On IdLicitacion [ PrimaryKey ];
}
To capture the data using Embedded Python I have used the xml.etree.ElementTree library that allows us to extract the values node by node. Here is the Python method I have used to map the XML:
Method ReadXML(xmlPath As %String) As %String [ Language = python ]
{
import xml.etree.ElementTree as ET
import iris
import pandas as pd
try :
tree = ET.parse(xmlPath)
root = tree.getroot()
for entry in root.iter("{http://www.w3.org/2005/Atom}entry"):
licitacion = {"titulo": "", "resumen": "", "idlicitacion": "", "url": "", "contratante": "", "urlcontratante": "", "estado": "", "valorestimado": "", "importetotal": "", "importetotalsinimpuestos": "", "clasificacion": "", "localizacion": "", "fechaadjudicacion": "", "ganador": "", "importeganadorsinimpuestos": "", "importeganador": ""}
for tags in entry:
if tags.tag == "{http://www.w3.org/2005/Atom}title":
licitacion["titulo"] = tags.text
if tags.tag == "{http://www.w3.org/2005/Atom}summary":
licitacion["resumen"] = tags.text
if tags.tag == "{http://www.w3.org/2005/Atom}id":
licitacion["idlicitacion"] = tags.text
if tags.tag == "{http://www.w3.org/2005/Atom}link":
licitacion["url"] = tags.attrib["href"]
if tags.tag == "{urn:dgpe:names:draft:codice-place-ext:schema:xsd:CommonAggregateComponents-2}ContractFolderStatus":
for detailTags in tags:
if detailTags.tag == "{urn:dgpe:names:draft:codice-place-ext:schema:xsd:CommonAggregateComponents-2}LocatedContractingParty":
for infoContractor in detailTags:
if infoContractor.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}Party":
for contractorDetails in infoContractor:
if contractorDetails.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}PartyName" :
for name in contractorDetails:
licitacion["contratante"] = name.text
elif contractorDetails.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}WebsiteURI":
licitacion["urlcontratante"] = contractorDetails.text
elif detailTags.tag == "{urn:dgpe:names:draft:codice-place-ext:schema:xsd:CommonAggregateComponents-2}ContractFolderStatusCode":
licitacion["estado"] = detailTags.text
elif detailTags.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}ProcurementProject":
for infoProcurement in detailTags:
if infoProcurement.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}BudgetAmount":
for detailBudget in infoProcurement:
if detailBudget.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}EstimatedOverallContractAmount":
licitacion["valorestimado"] = detailBudget.text
elif detailBudget.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}TotalAmount":
licitacion["importetotal"] = detailBudget.text
elif detailBudget.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}TaxExclusiveAmount":
licitacion["importetotalsinimpuestos"] = detailBudget.text
elif infoProcurement.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}RequiredCommodityClassification":
for detailClassification in infoProcurement:
if detailClassification.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}ItemClassificationCode":
licitacion["clasificacion"] = detailClassification.text
elif infoProcurement.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}RealizedLocation":
for detailLocalization in infoProcurement:
if detailLocalization.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}CountrySubentity":
licitacion["localizacion"] = detailLocalization.text
elif detailTags.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}TenderResult":
for infoResult in detailTags:
if infoResult.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}AwardDate":
licitacion["fechaadjudicacion"] = infoResult.text
elif infoResult.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}WinningParty":
for detailWinner in infoResult:
if detailWinner.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}PartyName":
for detailName in detailWinner:
if detailName.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}Name":
licitacion["ganador"] = detailName.text
elif infoResult.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}AwardedTenderedProject":
for detailTender in infoResult:
if detailTender.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}LegalMonetaryTotal":
for detailWinnerAmount in detailTender:
if detailWinnerAmount.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}TaxExclusiveAmount":
licitacion["importeganadorsinimpuestos"] = detailWinnerAmount.text
elif detailWinnerAmount.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}PayableAmount":
licitacion["importeganador"] = detailWinnerAmount.text
iris.cls("Ens.Util.Log").LogInfo("Inquisidor.BP.XMLToLicitacion", "VectorizePatient", "Terminado mapeo "+licitacion["titulo"])
if licitacion.get("importeganador") is not None and licitacion.get("importeganador") is not "":
iris.cls("Ens.Util.Log").LogInfo("Inquisidor.BP.XMLToLicitacion", "VectorizePatient", "Lanzando insert "+licitacion["titulo"])
stmt = iris.sql.prepare("INSERT INTO INQUISIDOR_Object.Licitacion (Titulo, Resumen, IdLicitacion, URL, Contratante, URLContratante, Estado, ValorEstimado, ImporteTotal, ImporteTotalSinImpuestos, Clasificacion, Localizacion, FechaAdjudicacion, Ganador, ImporteGanadorSinImpuestos, ImporteGanador) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,TO_DATE(?,'YYYY-MM-DD'),?,?,?)")
try:
rs = stmt.execute(licitacion["titulo"], licitacion["resumen"], licitacion["idlicitacion"], licitacion["url"], licitacion["contratante"], licitacion["urlcontratante"], licitacion["estado"], licitacion["valorestimado"], licitacion["importetotal"], licitacion["importetotalsinimpuestos"], licitacion["clasificacion"], licitacion["localizacion"], licitacion["fechaadjudicacion"], licitacion["ganador"], licitacion["importeganadorsinimpuestos"], licitacion["importeganador"])
except Exception as err:
iris.cls("Ens.Util.Log").LogInfo("Inquisidor.BP.XMLToLicitacion", "VectorizePatient", repr(err))
return "Success"
except Exception as err:
iris.cls("Ens.Util.Log").LogInfo("Inquisidor.BP.XMLToLicitacion", "VectorizePatient", repr(err))
return "Error"
}
Once the mapping is finished, we proceed to perform a simple insert with the record.
For mapping using ObjectScript I have used the %XML.TextReader functionality, let's see the method:
Method OnRequest(pRequest As Ens.StreamContainer, Output pResponse As Ens.Response) As %Status
{
set filename = pRequest.OriginalFilename
set status=##class(%XML.TextReader).ParseFile(filename,.textreader)
//check status
if $$$ISERR(status) {do $System.Status.DisplayError(status) quit}
set tStatement = ##class(%SQL.Statement).%New()
//iterate through document, node by node
while textreader.Read()
{
if ((textreader.NodeType = "element") && (textreader.Depth = 2) && (textreader.Path = "/feed/entry")) {
if ($DATA(licitacion))
{
if (licitacion.ImporteGanador '= ""){
//set sc = licitacion.%Save()
set myquery = "INSERT INTO INQUISIDOR_Object.LicitacionOS (Titulo, Resumen, IdLicitacion, URL, Contratante, URLContratante, Estado, ValorEstimado, ImporteTotal, ImporteTotalSinImpuestos, Clasificacion, Localizacion, FechaAdjudicacion, Ganador, ImporteGanadorSinImpuestos, ImporteGanador) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
set qStatus = tStatement.%Prepare(myquery)
if qStatus '= 1 {
write "%Prepare failed:" do $System.Status.DisplayError(qStatus)
quit
}
set rset = tStatement.%Execute(licitacion.Titulo, licitacion.Resumen, licitacion.IdLicitacion, licitacion.URL, licitacion.Contratante, licitacion.URLContratante, licitacion.Estado, licitacion.ValorEstimado, licitacion.ImporteTotal, licitacion.ImporteTotalSinImpuestos, licitacion.Clasificacion, licitacion.Localizacion, licitacion.FechaAdjudicacion, licitacion.Ganador, licitacion.ImporteGanadorSinImpuestos, licitacion.ImporteGanador)
}
}
set licitacion = ##class(Inquisidor.Object.LicitacionOS).%New()
}
if (textreader.Path = "/feed/entry/title"){
if (textreader.Value '= ""){
set licitacion.Titulo = textreader.Value
}
}
if (textreader.Path = "/feed/entry/summary"){
if (textreader.Value '= ""){
set licitacion.Resumen = textreader.Value
}
}
if (textreader.Path = "/feed/entry/id"){
if (textreader.Value '= ""){
set licitacion.IdLicitacion = textreader.Value
}
}
if (textreader.Path = "/feed/entry/link"){
if (textreader.MoveToAttributeName("href")) {
set licitacion.URL = textreader.Value
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cbc-place-ext:ContractFolderStatusCode"){
if (textreader.Value '= ""){
set licitacion.Estado = textreader.Value
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac-place-ext:LocatedContractingParty/cac:Party/cac:PartyName"){
if (textreader.Value '= ""){
set licitacion.Contratante = textreader.Value
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac-place-ext:LocatedContractingParty/cac:Party/cbc:WebsiteURI"){
if (textreader.Value '= ""){
set licitacion.URLContratante = textreader.Value
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:ProcurementProject/cac:BudgetAmount/cbc:EstimatedOverallContractAmount"){
if (textreader.Value '= ""){
set licitacion.ValorEstimado = textreader.Value
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:ProcurementProject/cac:BudgetAmount/cbc:TotalAmount"){
if (textreader.Value '= ""){
set licitacion.ImporteTotal = textreader.Value
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:ProcurementProject/cac:BudgetAmount/cbc:TaxExclusiveAmount"){
if (textreader.Value '= ""){
set licitacion.ImporteTotalSinImpuestos = textreader.Value
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:ProcurementProject/cac:RequiredCommodityClassification/cbc:ItemClassificationCode"){
if (textreader.Value '= ""){
set licitacion.Clasificacion = textreader.Value
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:ProcurementProject/cac:RealizedLocation/cbc:CountrySubentity"){
if (textreader.Value '= ""){
set licitacion.Localizacion = textreader.Value
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:TenderResult/cbc:AwardDate"){
if (textreader.Value '= ""){
set licitacion.FechaAdjudicacion = $System.SQL.Functions.TODATE(textreader.Value,"YYYY-MM-DD")
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:TenderResult/cac:WinningParty/cac:PartyName/cbc:Name"){
if (textreader.Value '= ""){
set licitacion.Ganador = textreader.Value
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:TenderResult/cac:AwardedTenderedProject/cac:LegalMonetaryTotal/cbc:TaxExclusiveAmount"){
if (textreader.Value '= ""){
set licitacion.ImporteGanadorSinImpuestos = textreader.Value
}
}
if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:TenderResult/cac:AwardedTenderedProject/cac:LegalMonetaryTotal/cbc:PayableAmount"){
if (textreader.Value '= ""){
set licitacion.ImporteGanador = textreader.Value
}
}
}
// set resultEmbeddings = ..GenerateEmbeddings()
Quit $$$OK
}
Both codes will only register in the database those contests that have already been resolved (they have been informed of the total winning amount).
Production configuration
With our methods implemented in the corresponding Business Processes, all that remains for our test is to configure the production that will allow us to feed both methods. We will simply add two Business Services that will simply capture the files with the XML information and send it to the Business Processes.
We will create two Business Services to avoid any possible interference when capturing and sending information to the Business Processes. The production will look like this:
.png)
For the test we will introduce the public tenders corresponding to the month of February, which make a total of 91 files with 1.30 GB of data. Let's see how both codes behave.
Ready...
On your marks...
Go!
XML parsing results using ObjectScript
Let's start with the time it took the ObjectScript code to map the 91 files:
.png)
The first file started at 21:11:15, let's see when the last file was mapped:
.png)
If we look at the details of the last message we can see the date the processing ended:
.png)
The end time is 21:17:43, which makes a processing time of 6 minutes and 28 seconds.
XML parsing results using Embedded Python
Let's repeat the same operation with the process that uses Python:
.png)
It started at 21:11:15 as in the previous case, let's see when it ended:
.png)
Let's look at the message in detail to know the exact ending:
.png)
The end time was 21:12:03, so the total time of processing is 48 seconds.
Well, we have a winner! In this round, Embedded Python has beaten ObjectScript, at least when it comes to XML parsing. If you have any suggestions or improvements to the code of both methods, I encourage you to put them in the comments and I will repeat the tests to check for possible improvements.
What we can say is that with regard to the obvious performance superiority of ObjectScript over Python...

(1).jpg)


