Pustulka, Elzbieta
E-Mail-Adresse
Geburtsdatum
Projekt
Organisationseinheiten
Berufsbeschreibung
Nachname
Vorname
Name
Suchergebnisse
Measuring the benefits of CI/CD practices for database application development
2023, Fluri, Jasmin, Fornari, Fabrizio, Pustulka, Elzbieta
Modern software development practices automate software integration and reduce repetitive software engineering work. Automation reduces the time it takes from defining software requirements to deploying the software in production. However, when it comes to database applications, the database integration and deployment are often executed manually, making it costly and error-prone. To mitigate this, we extended current software development methodologies by designing a CI/CD pipeline that takes into consideration the database setting. We report on two industrial case studies in which we implemented a newly designed pipeline and we measure the benefits of integration and deployment automation in database development projects. From a quantitative perspective, we found that introducing CI/CD pipelines reduces failed deployments, improves stability and increases the number of executed deployments. From a qualitative perspective, we interviewed the developers before and after the implementation of the CI/CD pipeline and the results show the CI/CD pipeline brings clear benefits to the development team (i.e., reduced cognitive load). This finding puts current database release practices driven by business expectations such as fixed release windows in question.
Extending SQL Scrolls to teach SQL DML
2022, Pustulka, Elzbieta, de Espona, Lucía, Kennel, Andrea
SQL (Structured Query Language) allows a business user to communicate with a relational database. A learner who wants to master SQL needs practice, patience and motivation, which we support in a game called SQL Scrolls. Student surveys we carried out show that this approach encourages our students to practice and students are enthusiastic and want to see more games in other subjects. We are now extending the game to cover all of SQL DML and offer 500 questions.
Building a NoSQL ERP
2022, Pustulka, Elzbieta, von Arx, Stefan, Espona, Lucía
Enterprise Resource Planning (ERP) systems are needed in many business activities. SMEs (small and medium enterprises) are not well served by current ERPs, as such systems are hard to tailor. This prompts us to experiment with building an ERP on top of a NoSQL database, which intends to be more flexible, as it is based on JSON and not on a relational data model. We present a novel ERP solution specifically designed to grow and evolve as the world changes. The ERP is for a service company which bills for time spent on customer projects. The work involves various challenges: data modelling, query specification, write and read performance analysis, versioning, user interface generation, and query optimisation. Here, we report on the performance of a NoSQL ERP using MongoDB and show that writes are fast and queries and reports are fast enough.
Text mining innovation for business
2020, Pustulka, Elzbieta, Hanne, Thomas, Dornberger, Rolf
This chapter reflects on the business innovation supported by developing text-mining solutions to meet the business needs communicated by Swiss companies. Two related projects from different industries and with different challenges are discussed in order to identify common procedures and methodologies that can be used. One of the partners, in the gig work sector, offers a platform solution for employee recruitment for temporary work. The work assessment is performed using short reviews for which a method for sentiment assessment based on machine learning has been developed. The other partner, in the financial advice sector, operates an information extraction service for business documents, including insurance policies. This requires automation in the extraction of structured information from pdf files. The common path to innovation in such projects includes business process modeling and the implementation of novel technological solutions, including text-mining techniques.
Automatic indexing for MongoDB
2023, Espona, Lucía, Vichalkovski, Anton, Steingartner, William, Pustulka, Elzbieta, Abelló, Alberto, Vassiliadis, Panos, Romero, Oscar, Wrembel, Robert, Bugiotti, Francesca, Gamper, Johann, Vargas Solar, Genoveva, Zumpano, Ester
We present a new method for automated index suggestion for MongoDB, based solely on the queries (called aggregation pipelines), without requiring data or usage information. The solution handles complex aggregations and is suitable for both cloud and standalone databases. We validated the algorithm on TPC-H and showed that all suggested indexes were used. We report on the performance and provide hints for further development of an automated method of index selection. Our algorithm is, to the best of our knowledge, the first query-based solution for automated indexing in MongoDB.
SQL scrolls - A reusable and extensible DGBL experiment
2022, Pustulka, Elzbieta, Krause, Kai, de Espona, Lucía, Kennel, Andrea, Stikkolorum, Dave, Rahimi, Ebrahim
The teaching of databases and SQL is an active research area. We contribute by presenting a reusable and extensible SQL teaching experiment which uses a game and fits the paradigm of digital game based learning (DGBL). Although DGBL is hampered partly by the difficulty of obtaining statistically significant empirical results, the research shows that it may be an effective learning method and that it is in demand. We investigate the acceptance and effectiveness of an SQL learning game and focus on two areas: student reaction to games as a vehicle for teaching, and educational effectiveness. We designed a game prototype and administered a pre-test, post-test and an acceptance survey, with seven part-time and sixteen full-time students. A statistical analysis of effect sizes revealed a moderate intervention effect for the game group (d= -0.562) and a small one for the traditional group (d= -0.234). The acceptance survey means were between 4.43 and 4.70 out of 5, which shows that the game is highly acceptable. Our experiment demonstrated positive student attitudes towards DGBL in SQL teaching and showed the game to be as effective as exercises done using a workbench. We further observed interesting differences in teaching using a game and a "natural" workbench environment and had excellent course feedback. We have released the game as open source in the hope that other researchers will replicate or contradict our findings or simply use it in teaching. We close with an outline of ongoing research.
FLIE: form labeling for information extraction
2021, Pustulka, Elzbieta, Hanne, Thomas, Gachnang, Phillip, Biafora, Pasquale, Arai, Kohei, Kapoor, Supriya, Bhatia, Rahul
Information extraction (IE) from forms remains an unsolved problem, with some exceptions, like bills. Forms are complex and the templates are often unstable, due to the injection of advertising, extra conditions, or document merging. Our scenario deals with insurance forms used by brokers in Switzerland. Here, each combination of insurer, insurance type and language results in a new document layout, leading to a few hundred document types. To help brokers extract data from policies, we developed a new labeling method, called FLIE (form labeling for information extraction). FLIE first assigns a document to a cluster, grouping by language, insurer, and insurance type. It then labels the layout. To produce training data, the user annotates a sample document by hand, adding attribute names, i.e. provides a mapping. FLIE applies machine learning to propagate the mapping and extracts information. Our results are based on 24 Swiss policies in German: UVG (mandatory accident insurance), KTG (sick pay insurance), and UVGZ (optional accident insurance). Our solution has an accuracy of around 84-89%. It is currently being extended to other policy types and languages.
Learning Java Loops and Control Structures by Moving a Ladybird
2023, Pustulka, Elzbieta, Spadola, Alessandro
We adapted an existing Java teaching game called JavaKara to help students learn how to use loops and control statements and tested it in class. Two groups of BSc Students in an introductory Java course played the game for about an hour. The game was evaluated using the MEEGA+ game evaluation method. A questionnaire was administered to get feedback and the game got a score 53.45, i.e. good. Students reported that they lost track of time and were satisfied with this new learning paradigm.
Document versioning for MongoDB
2022, Espona, Lucía, Pustulka, Elzbieta, Chiusano, Silvia, Cerquitelli, Tania, Wrembel, Robert, Nørvåg, Kjetil, Catania, Barbara, Vargas-Solar, Genoveva, Zumpano, Ester
Data versioning is required in various business and science contexts, including governance, risk and compliance (GRC) and is essential for security audits, legal compliance and business strategy development. We present a data versioning library for MongoDB to support an innovative enterprise resource planning (ERP) system for small and medium enterprises (SMEs) which aims to be flexible and adapt to changing business needs. We exploit the fact that the volume of archival data is orders of magnitude larger than of the currently valid documents and that historic data is rarely accessed. Experiments with eight sets of 1 million mutations/queries on 100K of valid documents (average size 2.3 kB), carried out over a period of 60 h on a local PC show stable average versioning write/read operation performance per document in the range of 12.3/1.2 ms which proves that the solution is viable in an SME scenario.
FLIE with rules
2021, Pustulka, Elzbieta, Hanne, Thomas, de Espona, Lucía
FLIE (Form Labelling for Information Extraction) allows us to extract information from Swiss insurance policies. Insurance policies are forms which are weakly aligned and do not lend themselves to automated data extraction without preprocessing. Our preprocessing annotates data with geometry and combined with manual training data generation gives the extraction accuracy of over 80% for a subset of attributes which have been seen 8 times or more. In this paper we extend FLIE with rules. The aim is to compare machine learning used in FLIE to the standard industry approach of using rules to extract data. We hand crafted rules (regular expressions in Python) for the KTG insurance (27 rules), UVG insurance (29 rules), and UVG-Z (23 rules), for each insurance type covering around 20 attributes. We also generated rules for building insurance policies which we were new to (16 rules encoded in SpaCy). In all cases we saw that using rules alone gives us a similar accuracy in data extraction to machine learning (around 80%). In the case of building insurance the accuracy is higher, above 96%, with precision and recall around 89-92%. To support annotation and experimental evaluation, we created an annotation GUI and a GUI which automates the ML experiment. Planned work includes a comparison of rule based and ML approaches and extension to further policy types.